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Annual  Review  Introduction 


A number  of  themes  which  run  through  the  papers 
in  this  Research  Review  are  present  eisewhere  in  our 
research. 

One  theme  is  decomposition.  The  decomposition 
ot  aigorithms  and  probiems  ieads  to  new  alaorithms 
tor  paraiiei  machines.  On  the  other  hand,  tor  certain 
probiems  there  are  theoreticai  iimits  on  decomposi- 
tion, ieading  to  theoreticai  iimits  on  speed-up  tor 
synchronous  or  asynchronous  paraiiei  machines.  In 
speech  understanding  research,  decomposition 
permits  processing  by  various  knowledge  sources 
and  facilitates  implementation  on  our  asynchronous 
multi-processor  machine.  We  see  this  as  a cost- 
effective  way  ot  obtaining  large  amounts  ot  com- 
puting power. 

A second  theme  is  experimentation.  In  computet 
science  we  often  build  o system  so  we  can 
manipulate  and  study  it.  This  approach  helps  us  to 
incorporate  the  latest  technology  in  our  design  ef- 
forts. The  system  must  enjoy  high  performance  to 
permit  enough  realistic  experiments. 

A third  theme  is  realism.  We  are  concerned  with 
constructing  as  realistic  models  as  possible,  limited 
only  by  our  current  understanding  ot  the  problem 
and  the  power  ot  our  analytical  tools.  For  example, 
this  is  basic  in  our  approach  to  the  analysis  ot 
algorithms  and  complexity. 

I want  to  turn  next  to  some  very  pleasant  news.  In- 
dividual members  ot  the  Department  have  received 
outstanding  recognition.  Al  Newell  and  Herb  Simon 
share  the  1975  Turing  Award  for  their  joint  scientific 
efforts  extending  over  twenty  years  during  which 
they  made  basic  contributions  to  artificial  in- 
telligence, the  psychology  ot  human  cognition,  and 
list  processing.  Gordon  Bell  has  won  the  1975 
McDowell  Award  tor  outstanding  contributions  in 
the  areas  of  technical  design,  education,  and 
publications  influential  in  developing  the  computer 
field.  Raj  Reddy  has  won  a Guggenheim  Fellowship 
to  work  on  functionally  spei  iaiized  computer 
architectures  tor  speech  and  vision  problems.  Jack 
Buchanan  serves  tor  a year  as  a Judicial  Fellow  with 
the  U S.  Supreme  Court  and  the  Federal  Judiciary 
Center  to  aid  in  modernizing  the  administration  ot 
the  Federal  courts. 

In  October  1975,  the  Department  celebrates  its 
tenth  anniversary.  Present  and  past  members  ot  the 
Department  will  join  in  the  celebration,  a three  day 
technical  symposium  on  research  at  the  frontiers  ot 
computer  science.  Invited  and  selected  papers  will 
appear  in  a commemorative  volume. 


4 , 


J.  F.  T. 
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Bounds  on  the  Speed>up  of 
Parallel  Evaluation  of  Recurrences 


Laurent  Hyafil  and  H.  T.  Kung 


I.  Introduction 

To  understand  the  performance  of  parallel  com- 
puters such  as  ILLIAC  IV  and  C.mmp,  we  must  know 
the  largest  speed-up  that  can  be  obtained  for  a 
given  task.  If  there  are  k processors,  the  largest 
speed-up  that  can  be  achieved  is  k and  we  call  this 
optimal  speed-up.  The  speed-up  in  general  depends 
on  the  parallel  decomposition  of  a particular  com- 
puting task  and  the  various  aspects  of  the  multi- 
processing system,  including  memory  contention, 
process  communication,  operating  system  over- 
head. etc.  In  this  paper,  we  concentrate  on  the  issue 
of  decomposing  tasks,  and  assume  that  the  multi- 
processing system  Is  Idealized  so  that  it  causes  no 
delays  at  all.  We  shall  show  that  even  under  this 
idealized  assumption,  there  are  problems  for  which, 
because  the  parallel  decompositions  are  inherently 
difficult,  the  optimal  speed-up  can  not  be  achieved. 

This  paper  studies  bounds  on  speeu-ups  for  a 
particular  problem,  i.e.,  the  problem  of  evaluating 
(or  solving)  recurrences,  which  is  defined  as  follows: 

Input:  Xq,x  .| x.p+i  and rationalfunctions  r.,  i>1. 


Output:  x»,  which  is  defined  by 


’‘i  = ’‘-p+l)’ 


Since  the  x,  are  defined  iteratively,  the  problem  ap- 


pears on  the  surface  to  be  highly  serial.  Hence  It  is 
interesting  to  investigate  how  parallel  algorithms  can 
be  designed  and  what  are  the  theoretical  limits  of 
using  parallelism  for  the  problem.  We  consider  the 
recurrence  problem  also  because  it  is  important  in 
practice  and  is  simply  stated  so  that  we  might  obtain 
some  insight  into  the  nature  of  parallel  computation 


by  studying  it.  We  shall  survey  a number  of  results 
in  connection  with  bounds  on  the  speed-ups  of 
parallel  evaluation  of  various  kinds  of  recurrences, 
especially  when  the  size  n of  the  problem  is  iorge, 
or  when  n ■•«>.  For  simplicity  we  assume  that  each 
arithmetic  operation  takes  one  unit  of  time.  Consider 
a k-processor  machine.  We  shall  see,  for  example, 
that  the  speed-up  for  the  first  order  linear  recurrence 
problem  is  at  most  (2/3)k  -I-  (1/3)  even  i nder  the 
idealized  assumption.  Of  course,  the  actual  speed- 
up obtained  from  a real  k-processor  machine  would 
be  < (2/3)k  -t-  (1/3).  The  difference  between  (2/3)k 
-f-  (1/3)  and  k is  rather  significant.  For  example,  if 
K = 16,  64,  then  the  speed-up  for  the  problem  is  at 
most  11,  43,  respectively,  no  matter  how  efficient 
the  k-processor  machine  is.  The  reason  that  we  get 
at  most  7C  percent  of  the  speed-i”*  we  might  expect 
for  the  problem  is  the  inherent  dependence  of  vari- 
ables in  the  recurrence.  Nonlinear  recurrences  ara 
even  worse.  It  is  shown  that  the  speed-ups  for  a 
certain  class  of  nonlinear  recurrence  problems  are 
always  bounded  by  a constant  no  matter  how  many 
processors  are  used  and  how  large  the  size  of  the 
problem  is.  Hence  the  dependency  relationships 
within  the  variables  of  these  nonlinear  recurrences 
are  even  stronger.  We  believe  that  the  study  of  these 
dependency  relationships  Is  fundamental  for  under- 
standing parallel  computation. 

The  kind  of  results  which  are  to  be  presented  in 
the  paper  could  be  useful  in  the  following  two  ways. 
First,  the  theoretical  bounds  on  speed-ups  provide 
grounds  for  testing  the  efficiencies  of  algorithms  and 
the  multiprocessing  system.  (For  example,  it  would 
be  very  helpful  if  tight  theoretical  bounds  on  speed- 
ups  are  known  for  benchmark  tasks.)  Second,  the 
constructions  of  the  algorithms  designed  for  the 
idealized  machine  are  instructive  and  often  lead  to 
useful  insights  into  the  nature  of  designing  efficient 
algorithms  for  real  machines. 


? I 


2.  Definitions  and  Notation 

An  algorithm  for  evaluating  Xp  is  defined  to  be  a 
directed  acyclic  graph  in  a natural  way.  For  example, 
the  graph  of  Figure  1 defines  a parallel  algorithm 


using  three  processors  for  evaluating  Xg,  which  is 


defined  by 


Xq  - a.,. 


’‘i  = ‘^i’^i-l  + 3|+1'  i 


(Note  that  Xg  = ((a.jb.|-f-a2)b2+ag)bg-f-a^.) 


I 
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: t 


ll 
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Figure  1 


Consider  the  directed  graph  which  defines  an 
algorithm.  We  define  the  depth  of  the  graph  to  be  the 
time,  define 

Tk(Xn>  = minimum  time  needed  to  evaluate  by 
an  algorithm  using  k processors. 

and  define  the  speed-up  of  the  problem  of  evaluat- 
ing Xp  by  using  k processors  to  be 


SkO^n)  = 


W 

■^k(’‘n) 


(In  Hyafil  and  Kung  [75aj  these  definitions  are  given 
in  a more  rigorous  way.) 

By  a simple  simulation  argument,  one  can  easily 
see  that  T.j(x^)  < k Tk(x^).  Hence 

Sktx^)<k,  Vk.Vn. 

k is  a trivial  upper  bound  on  S|^(Xp).  Bounds  smaller 
than  k are  nontrivial.  We  shall  show  some  nontrivial 
upp  r bounds  on  S^Cx^)  in  the  following  sections. 


3.  First  Order  Linear  Recurrences 

A first  order  linear  recurrence  is  defined  by 

(1)  Xj  = a|Xj_.|  + b|,  i>1. 

It  is  the  most  fundamental  recurrence,  in  the 
sense  that  algorithms  for  solving  it  often  form  basic 
algorithms  for  solving  other  types  of  recurrences. 

The  trivial  algorithm  which  computes  x.j,X2 Xp 

iteratively  according  to  (1)  is  the  optimal  sequential 
algorithm,  since  it  takes  time  2n  and  any  algorithm 
has  to  take  time  at  least  2n  for  using  all  the  inputs. 
Hence 

(2)  T,(Xp)  - 2n. 

The  algorithm,  hov/ever,  is  not  suitable  for  parallel 
computers  because  it  does  not  provide  any  parallel- 
ism. New  algorithms  are  needed  for  parallel  com- 
puters. Various  parallel  algorithms  have  been  de- 
veloped by  many  people,  including  Brent  [70,  74], 
Kogge  [74],  Kogge  and  Stone  [73],  Kuck  and 
Maruyama  [73],  Kuck  and  Maruyama  [75],  Lambiotte 
and  Voigt  [74], Stone  [73,  74]  and  Winograd  [74]. 
The  basic  idea  of  these  algorithms  can  be  explained 
as  follows: 

Note  that  (1)  is  equivalent  to 


1. 


Hence 


which  can  clearly  be  computed  in  parallel.  Using  the 
fact  that  the  multiplication  of  two  matrices  of  the 
formTx  xl  takes  three  operations  and  results  in  a 

lo 

matrix  of  the  same  form,  while  the  multiplication  of 

IX  xlandrxl  uses  two  operations  and  results  in  a 

OlJ  llJ 

vector  of  the  form  Jx  j , in  Hyafil  and  Kung  [75b]  a 

parallel  algorithm  based  on  (3)  is  derived  and 
establishes  that 

^k(>‘n)--ir^  '"f 

for  some  constant  c.j  > 0.  (4)  is  an  improvement  over 
the  corres  lending  result  in  Winograd  [74]  when  n Is 
large  and  K is  fixed. 


I 


( 


A 


HI 


In  Hyafil  and  Kung  [75a]  it  is  shown  that  if  an 
aigorithm  computes  in  time  t with  w operations, 
then 

(5)  w>3n  - j. 

Suppose  that  t<T,|(x^)  = 2n,  Then  by  (5)  w>2n. 
Hence  if  a paraiiei  aigorithm  is  faster  than  the 
optimai  sequentiai  aigorithm.  then  it  must  perform 
more  operations  than  the  sequentiai  aigorithm.  This 
turns  out  to  be  the  basic  reason  why  the  optimal 
speed-up  cannot  be  achieved  for  the  probiem. 
indeed,  iower  bounds  on  Tj^(x^)  can  be  easily 
derived  from  (5)  as  foiiows.  Suppose  that  k proces- 
sors are  used.  Observe  that  for  any  aigorithm,  kt  > w. 
Hence  by  (5)  we  have 


V’'n)-irM72 

Suppose  that  w > 2l*°9  (This  is  true  when,  say, 
n >k.)  In  Heller  [75]  it  is  pointed  out  that  in  this  case 
by  the  same  argument  as  used  in  Munro  and 
Paterson  [73,  Theorem  1],  the  bound  in  (6)  can  be 
slightly  improved.  In  fact,  we  have 

K(t-[log  kj)  + 2l'°9  - 1 >w, 

which  together  with  (5)  yields 
— . . ^ 3n  . ^ «rinn  kl, 


The  upper  bound  in  (8)  implies  that  even  for  the 
simplest  recurrence  defined  by  (1),  we  can  get  at 
most  70  percent  of  the  optimal  speed-up. 

The  algorithm  used  to  establish  the  bound  in  (4) 
can  be  extended  to  solve  first  order  vector  linear 
recurrences,  defined  by 

(10)  Xj  = AjX^_.|  -l-b|.  i >1, 

where  the  x’s  and  the  b’s  are  p-vectors  and  the  A’s 
pxp  matrices.  The  upper  bounds  on  time  fo  • solving 
these  vector  recurrences  can  similarly  be  obtained. 

4.  Pth  Order  Linear  Recurrences 

A pth  order  linear  recurrence  is  defined  by 
i-1 

(11) Xj=  2 a,jX.  + bi,i>1. 

j = i-p 

The  problem  for  solving  such  a recurrence  in  parallel 
has  been  considered  in  Chen  and  Kuck  [75],  Kogge 
[74]  and  Kogge  and  Stone  [73]. 

The  following  theorem  generalizes  the  upper 
bound  result  In  (8). 

Theorem  2 (Hyafil  and  Kung  [75b]) 

For  the  pth  order  linear  recurrence  defined  by 

(11), 


Tk(Xn)-in-~1/2  k]  + 1 - 2l'°9  •'l)/(k  + 1/2).  '^4'  ''P-'^k.vn. 


■^kV-FM72  +C2logk-Cg,  n>k 


for  some  constants  Cg  > 0 and  Cg  > 0.  From  (4)  and 

(7),  we  know  that  the  bounds  are  essentially  sharp 
for  n > k. 

From  (2),  (4),  (6)  and  (7)  we  have  the  following 
Theorem  1 

For  the  first  order  linear  recurrence  defined  by  ( 1), 


for  some  constant  c^. 

Since  2^^<1,  the  theorem  implies  that  we  cannot 

essentially  obtain  the  optimal  speed-up  for  solving 
pth  order  linear  recurrences  for  any  p,  when  k is  large. 

We  now  consider  parallel  algorithms  for  solving 
the  recurrence  defined  by  (11).  The  idea  is  to  convert 
it  into  a first  order  vector  linear  recurrence  of  the 
form  (10),  which  can  then  be  solved  by  algorithms 
used  in  the  preceding  section. 

The  naive  approach  for  the  conversion  would  be 
the  following  way:  Define  vectors 


i 


g ^ c^(k-H'2)log  k 


<Sk(xJ< 


(2/3)k-F  1/3,  vk,  vn 


(Cglog  k-Cg)  (k-(-1/2) 


. , vk,  vn  2 k. 


♦ i 


then  (11)  is  equivalent  to 

(13)  X|  = A|Xj..  + bj,  i = 1,2, ...,n 

where  the  A-,  are  certain  companion  matrices.  Then 
algorithms  for  solving  first  order  vector  linear 
recurrences  can  be  applied  to  compute  (and 
hence  Xp)  from  (13),  We  shall  use  another  conver- 
sion technique,  which  will  lead  us  to  p times  faster 
algorithms  for  the  case  that  k,  p are  fixed  and  n .®. 
The  idea  is  explained  in  the  following  for  the  case  of 
p = 3.  We  can  write  a 3rd  order  linear  recurrence  as 

j = i-3 

10  where  = -a|j.  Then  for  computing,  say  Xg.from 
Xg,x  ,|,x_2  we  have 


where  m = fh/pl.  Tj,  Aj,  X|,  b|  are  of  size  p,  and  Tj 
are  triangular.  We  shall  first  compute  Tj-lAj  and 
T ‘^  b for  i < m,  and  then  use  algorithms  in  Section 
3*to  solve  the  recurrence  (14).  Since  m = (n/pl,  the 
recurrence  (14)  is  shorter  than  the  recurrence  (13) 
by  a factor  of  p.  Thus  we  get  faster  algorithms.  (It 
turns  out  that  the  cost  of  computing  Tj-lA|  and 
Tj-lpi  is  not  crucial.)  From  this  approach  it  is  im- 
mediate to  prove  that 

p‘*  , 

(15)  T,^(x^)<C5(-^n  -F  p‘*log  n), 

for  some  constant  Cg>0,  where  a = 2 when  the 
usual  matrix  multiplication  algorithm  is  used  and  a 
= 1.82  when  the  Strassen’s  matrix  multiplication 
algorithm  (Strassen  [69])  is  used.  (In  Hyafil  and  Kung 
[75b]  it  is  shown  that  the  bound  in  (15)  also  holds  for 
the  problem  of  solving  nxn  band  linear  system  with 
bandwidth  p.)  Since  T.|(Xp)  2 (p+1)n,  taking  ir  = 
1.82  in  (15)  we  have  that  for  any  k and  p. 


If  we  partition  the  matrix  and  vectors  into  blocks  as 
indicated  above,  then  we  have 


(16)  Sk(Xp)>-^(;^),asn  • 

for  the  problem  of  solving  pth  order  linear  recur- 
rences. Does  S|^/k  indeed  decrease  as  p increases? 
The  question  is  still  open.  We  only  know  that  by  (12) 
S|^  is  always  less  than  k for  large  k.  We  believe  that 
as  p increases,  more  dependency  relationships  on 
the  x's  defined  by  (11)  will  be  introduced  and  hence 
S|^/k  will  decrease. 

Conjecture 

Consider  the  problem  ot  solving  pth  order  linear 
recurrences  delined  by  (1 1).  Let  the  maximal  speed- 
up ratio  achievable  by  using  k processors  to  be 


1 

_> 

Ti  o' 

IX 

o 

KT 

L° 

'^2 

^1 

.-2 

^2 

Hence 

x^  = -Ti-lA,,Xo  + T^-1b^, 

X2  = -T2-''A2Xi  + T2-'>b2, 

which  is  a first  order  vector  linear  recurrence.  Using 
the  same  idea,  for  general  p,  we  have 

(14)  Xj  = (T.|-lAj)X|.,,  + T,-1bj.  I = 1,2 m 


Si^(p)  = max  S|^(Xj,). 


Then  there  exists  a monotonically  decreasing  func- 
tion \ such  that 


S^(p)<X(p)k,  vk. 


and  HP)  - 0 as  p - ®. 

The  following  theorem  relates  our  conjecture  on 
speed-ups  to  the  matrix  multiplication  problem. 


Theorem  3 (Hyafil  and  Kung  [75b]) 

If  the  conjecture  is  true  then  0(n^/\(n)j  is  a lower 
bound  on  the  number  of  arithmetic  operations 
needed  to  multiply  two  nxn  matrices. 

Note  that  the  question  of  whether  or  not  matrix 
multiplication  can  be  done  in  0(n2)  operations  f:aC 
been  open  for  some  years. 


.il  . 


5.  General  Line>.r  Recurrences 

A general  linear  recurrence  is  defined  by 

(17)  X.  = 2 aijXj  + b|,  i>1. 
i=0 

The  problem  of  solving  general  linear  recurrences  is 
reducible  to  that  of  solving  triangular  linear  systems. 
Heller  [74a]  first  considered  the  problem  of  solving 

(17)  in  parallel  and  gave  algorithms  which  take  time 
0(log2n)  and  use  ©(n'*)  processors,  It  was  shown 
later  that  the  problem  in  fact  could  be  done  in  time 
O(log^n)  with  O(n^)  processors  by  a number  of 
people  in  at  least  three  different  ways  (see,  e.g., 
Borodin  and  Munro  [75],  Chen  and  Kuck  [75],  Heller 
[74b]  and  Orcutt  [74]). 

For  the  case  of  using  small  parallelism  it  is  shown 
in  Hyafil  and  Kung  [74]  that 
2 

(18)  T^(x^)<'l-+  Cgn  ifk<n, 


Tk(Xn)5 


c^n^*'^log  n it  k=|n'^l  and  1<r<3/2, 


Con^'^^^^log^n  it  k=|n'^l  and  3/2Sr<3. 
o 


where  Cg,  Cj,  Cg  are  positive  constants. 

Since  there  are  n(n  + 1 )/2  inputs  tor  the  recurrence 
(17),  we  have 

while  the  trivial  sequential  algorithm  establishes  that 
T^(x^)<n(n+1). 

There  is  a gap  between  the  lower  and  upper  bounds 
on  T.|(Xj^).  We  believe  that  T.|(Xp)  = n2  + 0(n). 
Suppose  that  is  true.  Then  from  (18).  we  have 
S(^(Xp)  -kasn  -od,  i.e..  opti.  al  speed-up  is  achieved 
asymptotically,  which  woulc  be  in  interesting  con- 
trast with  pth  order  linear  recurrences,  where 
optimal  speed-ups  are  not  asymptotically  achieved. 


6.  Nonlinear  Recurrences 

A nonlinear  recurrence  is  defined  by 

(19)  Xj  = <P(Xj.^,X|.2 Xj.p)'  I 

where is  a nonlinear  rational  function.  Write  <P  = 
^l<p 2 where  and  (pg  polynomials  which 
are  relatively  prime.  Define  the  degree  of  a non- 
linear recurrence  to  be 

deg  Ip  = max(deg  ip ^,deg  <P2'>- 

Hence,  for  example,  the  well  known  recurrence, 

(20)  Xj^.,  = (1/2)  (Xj  -l-A/X|), 

for  approximating  /A  has  degree  2.  For  linear 
recurrences  we  can  have  unbounded  speed-up 
when  k - ® and  n - ®.  For  example,  by  Theorem  1 
we  know  that  if  k = n the  first  order  linear  recur- 
rence can  be  sped-up  by  a factor  of  n/log  n,  which 
is  unbounded  as  n • ».  The  following  theorem  shows 
that  the  theory  of  nonlinear  recurrences  of  degree 
•»  1 is  completely  different  from  that  of  linear  recur- 
rences. 

Theorem  4 (Kung  [74]) 

For  the  recurrence  defined  by  (19),  it  degip  > 1, 


S,^(>.p)  ^ Cg,  vk.vn, 
for  some  constant  Cg. 

The  theorem  implies  that,  e.g.  the  recurrence 
defined  by  (20)  cannot  essentially  be  sped  up  by 
using  parallelism. 

The  only  nonlinear  recurrences  which  can  possibly 
have  unbounded  speed-up  by  using  parallelism  are 
of  the  form 


(21)  Xj  = ( 2 Vi  ^ ^ ^ 

\i  = i-p  / M = i-c 


CijXj  + dj 


which  is  of  decree  one.  Indeed,  the  recurrence 


(22)  Xj  = a,  -k  -- — , 

' ' ’^i-1 

i.e..  a continued  fraction,  can  be  sped  up. 

Theorem  5 (Hyafil  ar.d  Kung  [75b],  Kogge  [74], 
Kuck  and  Maruyama  [73]  and  Winograd  [74]) 

For  the  recurrence  defined  by  (22), 


(23)(1/2)k+Cio  ^ Sj^(x^)  > 


, ] I'"- "■ 


U2/5)k  a*  n . ®.  »k 


t 


for  some  constants  c^^j, 

By  Theorem  1 and  (23)  we  note  that  recurrences 
with  division  seem  to  be  more  difficult  than  those 
without  division  in  parallel  computation.  The  same 
observation  can  also  be  made  to  the  problem  of 
evaluating  arithmetic  expressions  (see  Brent  [74] 
and  Winograd  (74|), 

It  is  clear  that  the  recurrence 


cun  also  be  sped  up  by  using  parallelism,  since  it  can 
be  transformed  into  a continued  fraction.  However, 
by  the  following  theorem  we  know  general  recur- 
rences defined  by  (21)  cannot  essentially  be  sped 
up. 

Theorem  6 (Hyafil  and  Kung  [75b|) 

For  the  recurrence  defined  by  (21)  it  either 
p > 1 or  q > 1,  then 


7.  Summary  and  Conclusions 

We  have  shown  a number  of  results  on  the 
theoretical  limitation  of  using  parallelism  for  solving 
recurrences.  For  pth  order  linear  recurrences,  with  k 
processors  the  speed-up,,  are  shown  to  be  bounded 
by  ck  + d for  some  constants  c,  d,  with  c < 1,  r?o 
matter  how  large  the  size  ol  the  problems  The  sharp 
upper  bound  is  obtained  for  first  order  linear  recur- 
rences. For  nonlinear  recurrences  of  degree  > 1, 
the  speed-ups  are  shown  to  be  bounded  by  a con- 
stant, no  matter  how  many  processors  are  used  a"d 
how  large  the  size  ot  the  problems.  This  is  probably 
the  first  and  may  be  the  only  known  example  of  a 
nontrivial  problem  which  cannot  be  essentially  sped 
up.  By  these  results  we  wish  to  demonstrate  that 
the  gain  from  parallelism  very  much  depends  upon 
the  nature  of  individual  problems,  e g.,  the  depend- 
ency relationships  among  the  variables  of  the  prob- 
lems. We  believe  that  to  identify  properties  which 
prevent  us  from  getting  good  speed-ups  is  funda- 
mental for  understanding  parallel  computation. 


< Cj2,vk,  vn 

12 


for  some  constant  c 
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Overview  of  the  Hearsay 
Speech  Understanding  Research 

Lee  D.  Erman 

Hearsay  Is  the  generic  name  for  much  of  the 
speech  understanding  research  in  the  computer 
science  department  at  Carnegie-Mellon  University 
(CMU).  The  major  goals  of  this  research  include  the 
investigation  of  computer  knowledge-based  prob- 
lem-solving systems  and  the  practical  Implementa- 
tion of  speech  input  to  computers.  An  emphasis  of 
this  effort  is  the  design  of  system  structures  for  ef- 
ficient implementation  of  such  systems. 

We  will  first  describe  the  problem  of  speech 
understanding  and  (in  Section  2)  present  the  con- 
text of  the  Hearsay  effort.  Section  3 describes  the 
Hearsay  model  and  implementation  philosophy. 
Then,  In  Section  4,  Hearsayl  is  described,  including 
some  major  design  limitations  which  formed  much 
of  the  motivation  for  Hearsayll,  described  In  Section 
5. 

I.  The  Problem  ot  Speech  Understanding 

In  order  to  provide  a framework  for  discussion,  a 
conceptual  model  of  speech  communication  is 
presented: 

1)  The  purpose  of  a speech  utterance  is  to  trans- 
mit information  from  the  speaker  to  the  listener. 

2)  The  speaker  starts  with  some  deep  semantic 
representation  of  the  message.  Several  kinds  of 
transformations  are  applied  to  this  representation 
(syntactic,  linguistic,  phonological,  neurological, 
articulatory,  acoustic,  etc.).  The  result  is  an 
acoustic  signal. 

3)  The  acoustic  signal  is  detected  by  the  listener. 
The  listener  applies  transformations  which  are 
similar  (though  Inverse)  to  those  of  the  speaker; 


the  result  is  some  semantic  representation  for  the 
listener. 

4)  The  correctness  or  effectiveness  of  the  trans- 
mission is  related  to  the  correspondence  be- 
tween the  meaning  that  the  speaker  intends  and 
the  meaning  derived  by  the  listener;  it  is  measured 
in  behavioral  terms — i.e.,  what  actions  of  the 
listener  are  triggered  by  receipt  of  the  message. 
(Very  often  this  behavior  takes  the  form  of  an  ut- 
terance generated  by  the  original  listener.) 

The  goal  of  automatic  speech  understanding  is  to 
produce  a machine  (usually  in  the  form  of  a com- 
puter program)  which  can  effectively  perform  as  the 
listener. 

The  problem  of  understanding  speech  with  the 
competence  of  a human  is  formidable.  A reasonable 
plan  is  to  approach  the  most  general  kinds  of 
solutions  by  designing  and  building  a sequence  of 
systems,  each  of  which  is  more  ambitious  than  the 
previous  There  are  many  dimensions  along  which 
to  move  to  p'ovlde  this  graded  sequence  (e.g., 
requirements  of  vocabulary  size,  speed  of  response, 
accuracy,  number  of  speakers).  A way  of  capturing 
these  various  dimensions  is  the  concept  of  a task — 
a well-defined  domain  within  which  the  machine  Is  to 
perform  some  functions.  For  example,  the  task 
might  be  to  answer  the  user’s  (speaker’s)  questions 
about  airline  flight  schedules  or  to  provide  an  inter- 
active computer-programming  facility.  In  defining- 
a task,  one  important  aspect  is  the  spoken  input 
language  This  language  is  pre-specificd  lexically, 
syntactically,  and  semantically;  that  is,  descriptions 
are  given  of  the  words,  how  they  may  be  sequenced 
to  form  sentences,  and  the  meaning  or  the  sentences 
in  the  context  of  the  task.’ 

There  are  two  major  aspects  of  the  speech  com- 
munication process  which  generate  most  of  the 
problems  in  machine  understanding: 

1)  The  nature  of  the  speech  signal— The  trans- 
formations involved  in  speech  production  are 
many  and  complex,  and  they  strongly  interact 
with  each  other.  The  result  is  a very  large  amount 
of  veriability  in  the  signal  which  conveys  little  or 
no  meaning,  i.e.,  which  is  noise  in  the  context  of 
the  speech  understanding  task.  Repetitions  of  the 
"same”  utterance,  spoken  by  one  speaker  under 
unchanging  conditions  just  seconds  apart  often 


1 This  use  of  a task  to  constrain  the  problem  is  not 
as  artificial  as  it  may  first  appear.  Usually  human 
speech  understanding  Is  also  performed  in  "con- 
strained domains’’— in  almost  any  given  situation 
only  a small  subset  of  all  possible  messages  is 
likely. 


result  in  significant  variation  of  the  signal.  As  the 
various  conditions  (e.g.,  identity,  age,  gender, 
emotional  state,  and  environment  of  the  speaker) 
are  relaxed  and  allowed  to  change,  this  variability 
increases  significantly.  Further,  strong  inter- 
actions occur  among  the  various  elements;  that  is, 
words,  phones,  phrases,  etc.,  influence  and 
modify  nearby  words,  phones,  phrases,  etc.,  and 
thus  have  differing  manifestations  in  different  con- 
texts.* 

2)  The  nature  of  our  knowledge  of  the  transforma- 
tions—Theories  which  attempt  to  explain  the  pro- 
duction of  speech  are,  in  general,  incomplete  and 
inadequate  in  explaining  the  phenomena  with  a 
great  deal  of  accuracy.  Also,  it  is  often  difficult 
to  translate  existing  theories  into  the  framework  of 
feasible  recognition  algorithms. 

Largely  because  of  these  two  aspects,  the  kinds  of 
machine  speech  understanding  systems  developed 
can  be  characterized  as  having  several  interesting 
and  problem-laden  features; 

1)  The  system  must  make  use  of  multiple  and 
diverse  sources  of  knowledge  to  solve  the  prob- 
lem (e.g.,  acoustic-phonetics,  phonology,  syntax, 
semantics,  pragmatics);  these  knowledge  sources 
(KSs)  correspond  to  the  different  kinds  of  trans- 
formations that  generate  the  speech  signal.  De- 
signing an  effective  control  structure  for  these 
many  diverse  KSs  is  crucial  and  difficult. 

2)  Each  sourc  of  knowledge  is  incomplete  and 
errorful.  Thus,  although  it  is  used  in  an  attempt  to 
further  the  recognition  of  an  utterance,  each  KS 
will  also  introduce  errors  into  the  analysis  process. 
The  different  sources  must  work  to  correct  each 
other’s  mistakes  in  order  to  keep  errors  from 
propagating  excessively. 

3)  The  systems  developed  tend  to  be  large  and 
complex.  Building,  debugging,  understanding, 
and  evaluating  them  is  difficult.  In  particular,  many 
researchers  need  to  interact  with  the  system  over 
a period  of  several  years,  both  experimenting  with 
its  operation  and  modifying  it.  An  important 
aspect  of  system  modification  is  the  ability  to 
modify  ano  replace  individual  KSs. 

4)  Because  of  the  effectiveness  and  apparent  ease 
of  human  performance  in  the  speech  understand- 
ing task,  a useful  solution  to  the  problem  must  be 
a system  which  approaches  that  performance, 
primarily  in  terms  of  speed,  accuracy,  and, 
ultimately,  economy. 

2 We  are  concerned  here  with  connected  speech 
input,  as  opposed  to  isolated  word  systems  in 
which  the  words  (for  short  phrases  treated  as  in- 
divisible units)  are  spoken  individually. 


5)  Because  the  systems  tend  to  be  highly  experi- 
mental, they  must  be  exercised  often  and  over 
substantial  amounts  of  trial  data.  The  perform- 
ance of  the  system  while  under  development 
(particularly  in  terms  of  speed  of  execution)  is  an 
important  factor  in  determininr  how  much  ex- 
perimentation can  occur.  Thus,  issues  of  perform- 
ance are  crucial  even  in  the  development  stage. 

6)  Because  the  systems  are  complex  and  experi- 
mental, the  interlace  through  which  the  researcher 
controls  and  interacts  with  the  system  is  crucial. 
The  researcher  must  be  able  to  interact  with  the 
system  flexibly  and  at  the  functional  level  of  the 
system  (in  addition  to  the  more  traditional 
machine  language  and  programming  language 
levels). 

This  has  been  a short  introduction  into  the  prob- 
lems of  developing  speech  understanding  systems. 
A more  complete  analysis  of  the  problem,  including 
pointers  to  the  relevant  literature,  can  be  found  in 
Newell  et  al.  [71  ]. 

2.  Context  of  This  Work 

Hearsay's  direct  lineage  can  be  traced  back  ten 
years.  The  work  of  Reddy  and  Reddy  & Vicens  at 
Stanford  University  (Reddy  [66];  Reddy  and  Vicens 
[68 j;  Vicens  [69])  resulted  in  extending  the  state-of- 
the-art  of  isolated  word  recognition  systems  (e.g., 
91%  accuracy  on  a 561-word  lexicon  in  ten  times 
real-time  on  a PDP10  and  with  live  input).  This 
system  differed  from  most  earlier  ones,  which  were 
essentially  pattern  classifiers,  in  that  it  contained  a 
substantial  amount  of  speech  knowledge  and  it  used 
extensive  heuristics  in  applying  the  knowledge  to 
prune  the  search  space.  In  addition,  one  version  of 
the  system  was  created  which  used  syntactic  con- 
straints and  operated  on  connected  speech 
(although  in  a very  ad  hoc  and  unextendable  manner). 

The  Hearsay  model  for  speech  understanding  was 
developed  at  CMU  during  1970-1971  (Reddy,  Er- 
man,  and  Neely  [70];  Reddy  [71];  Reddy,  Erman,  and 
Neely  [72]).  This  model  faced  the  problems  of 
speech  understanding  (i.e.,  in  a task  domain)  ano 
connected  speech.  The  Hearsay!  system  was 
designed  and  built  as  an  implementation  of  this 
model  (Reddy,  Erman,  and  Neely  [73];  Reddy,  Er- 
man, Fennell,  and  Neely  [73];  Neely  [73];  Erman 
[74]).  This  system,  which  was  the  first  demonstrable 
live  system  to  handle  non-trivial  connected  speech, 
became  operational  in  June,  1972,  and  has  been 
since  augmented  and  studied  (Lowerre  [75]). 
Although  a number  of  simplifying  assumptions  were 
made  in  implementing  the  model,  Hearsayl  does  ad- 
dress the  problems  of  connected  speech  and  of  the 
role  and  interactions  of  different  kinds  of  knowledge. 
By  exhibiting  a successfully  working  system  which  is 


based  on  a model  and  by  providing  a set  of  solutions 
to  these  problems  (even  If  some  of  the  solutions  are 
known  to  be  far  from  optimal),  Hearsayl  clarified  the 
problems  and  serves  as  a basis,  and  encourage- 
ment, for  subsequent  work. 

The  experience  of  building  and  experimenting 
with  Hearsayl,  together  with  the  other  research  in  the 
field,  led  to  a design  review  which  resulted  in  the 
Hearsayll  system  (Lesser,  Fennell,  Erman,  and  Red- 
dy (741).  Hearsayll  is  also  based  on  the  Hearsay 
model;  it  generalizes  and  extends  many  of  the  con- 
cepts which  exist  in  a more  simplified  form  in  the 
Hearsayl  system. 

Concurrent  with  the  early  stages  of  the  Hearsay 
development,  a group  was  formed  by  the  Advanced 
Research  Projects  Agency  (ARPA)  to  study  the 
feasibility  of  developing  speech  understanding 
systems.  This  group,  which  included  researchers  ac- 
tive in  artificial  intelligence  as  well  as  those  In  more 
traditional  directions  of  speech  recognition 
research,  produced  its  report  in  May,  1971.  This 
report  (Newell  et  al.  (71))  provides  a comprehensive 
and  .detailed  analysis  of  the  problems  involved.  Part 
of  this  study  included  the  specification  of  a set  of 
nineteen  dimensions  for  describing  the  capabilities 
of  a speech  understanding  system— the  first  column 
of  Figure  1 summarizes  those  dimensions. 

On  recommendation  of  the  study  group,  a five- 
year  ARPA  Speech  Understanding  Research  effort 
was  launched  in  October,  1971.  An  innovative  plan 
with  five  principal  contractors  (including  CMU)  was 
chosen;  each  was  to  aim  to  produce  a complete 
system  meeting  a set  of  specifications  laid  out  by  the 
study  group  (the  second  column  of  Figure  1)  and  all 
were  to  Interact,  exchanging  ideas  and  data. 
Although  charged  to  meet  the  same  set  of 
specifications,  each  group  was  free  to  choose  its 
own  orientation  (and  task  domains).  Thus,  the  flavor 
of  each  of  the  systems  reflects  the  particular  exper- 
tises and  motivation:,  of  the  people  involved. 

The  Hearsay  reseai  :h  represents  CMU's  major  ef- 
forts to  meet  the  ARPA  specifications;  in  particular,  it 
k hoped  that  Hearsayll  will  accomplish  that  goal.  In 
addition,  several  other  systems  are  being 
experimented  with,  also  aiming  to  meet  these  goals: 
a version  of  the  Dragon  system  (Baker  [75])  being 
extended  by  Reddy  and  Lowerre  and  a combination 
of  Hearsayl  and  Dragon  (Lowerre  [75]). 

In  this  paper  we  will  describe  only  the  Hearsay  ef- 
fort. An  IEEE  symposium  on  speech  recognition  was 
held  at  CMU  in  April,  1974,  at  which  most  workers  in 
the  field  were  represented.  The  confibuled  and  in- 
vited papers  from  that  symposium  (Erman  [74b]; 
Reddy  [75])  provide  a comprehensive  description  of 
the  state-of-the-art  at  that  time. 


3.  The  Hearsay  Model  and  Implementation 
Philosophy 

This  section  describes  a general  model  of  speech 
understanding,  the  “Hearsay  model",  and  some  of 
the  problems  implied  by  that  model.  The  following 
two  sections  provide  overviews  of  the  Hearsayl  and 
Hearsayll  implementations  of  that  model. 

As  one  knowledge  source  (KS)  makes  eriors  and 
creates  ambiguities,  other  KSs  must  be  brought  to 
bear  to  correct  and  clarify  those  actions.  This  KS 
cooperation  should  occur  as  soon  as  possible  after 
the  introduction  of  an  error  or  ambiguity  in  order  to 
limit  its  ramifications.  The  mechanism  used  for 
providing  this  high  degree  of  cooperation  is  the 
hypothesize-and-test  paradigm.  In  this  paradigm, 
solution-finding  is  viewed  as  an  iterative  process. 
Two  kinds  of  KS  actions  occur;  a)  the  creation  of  an 
hypothesis,  an  "educated  guess’  about  some  aspect 
of  the  problem,  and  b)  tests  of  the  plausibility  of  the 
hypothesis.  For  both  of  these  steps,  the  KS  uses  a 
priori  knowledge  about  the  problem,  as  well  as  the 
previously  generated  hypot'ieses.  This  “iterative 
guess-building " terminates  when  a consistent  sub- 
set of  hypotheses  is  generated  whicti  satisfies  some 
specified  requirements  for  an  ove’all  solution. 

As  a strategy  for  developing  such  systems,  one 
needs  the  ability  to  add  and  replace  KSs  and  to 
explore  different  control  strategies.  Thus,  such 
changes  must  be  relatively  oasy  to  accomplish;  there 
must  also  be  ways  to  eval  late  the  performance  of 
the  sys'em  in  general  and  the  roles  of  the  various 
KSs  and  control  strategies  in  particular.  This  ability 
to  experiment  conveniently  with  the  system  is  crucial 
if  the  amount  of  knowledge  is  large  and  many  people 
are  needed  to  introduce  and  validate  it.  One  means 
of  helping  to  provide  these  flexibilities  is  to  require 
that  KSs  be  independent;  i.e.,  the  explicit  interac- 
tions between  KSs  and  their  assumptions  about 
each  other  must  be  minimal. 

Besides  providing  for  the  modification  and 
evaluation  of  KSs,  decomposition  of  the  system  into 
relatively  independent  KSs  also  facilitates  its  im- 
plementation on  an  asynchronous  multi-processor 
machine.  Such  configurations  seem  increasingly  at- 
tractive as  cost-effective  ways  of  obtaining  large 
amounts  of  computing  power  One  problem  that  has 
limited  the  de/elopment  and  usage  of  such 
machines  is  the  difficulty  of  decomposing  large 
problems  for  such  machines.  Erman,  Fennell, 
Lesser,  and  Reddy  [73]  describe  this  problem  and 
outline  some  early  solutions  in  the  Hearsay  context; 
Lesser  [75]  provides  a survey  of  this  subject. 

The  basic  view  of  development  of  a speech  un- 
derstanding system  includes  a strong  component  of 
experimentation:  one  needs  to  build  a system  and 
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Dimensions  and  Examples 

( 1 ) Manner  ol  Speech 
connected?  isolated  words? 

(2)  Number  ol  Speekers 
one?  small  set?  open  population? 

(3)  Dialed  and  Manner 
cooperative?  casual?  single  gender? 
both  genders?  children?  what  dialect(s)? 

(4)  Environmenlal  Condilions 
quiet  room?  computer  room?  factory?  public 
place? 

(5)  Trensducer 

high  quality  microphone?  telephone? 

(6)  Speaker  Tuning 

few  sentences?  paragraphs?  f jll  vocabulary? 

(7)  Speaker  Training 
natural  adaptation?  elabo  a'n? 

(3)  Vocebulary  Size  and  Sel  clion 
SO?  200?  1,000?  10,000' 
preselected?  selective  rejection?  free? 

(9)  Grammar 

fixed  phrases?  artificial  language?  free  English? 
adaptable? 

(10)  Task 

highly  constrained  (e.g.,  simple  retrieval)? 
focussed  (e.g.,  numerical  algorithms)?  open? 

(11)  User  Model 

nothing?  current  knowledge  about  the  user? 

(12)  Mode  ol  Inleradion 

response  only?  ask  lor  repetitions? 
explain  language?  discuss  communications? 

( 1 3)  Error  Rele 

none(<0.1%)?  <10%?  >20%? 

(14)  Response  Time 

no  hurry?  lew  times  real-time?  immediate? 

(15)  Processing  Power 

1x10*  iMstructions/sec?  10  mips?  100  mips? 
1000  mips? 

(16)  Memory  Size 

1 megabit?  lOmb?  lOOmb?  lOOOmb? 

(17)  System  Organization 

simple  program?  multiprocessing?  parallel 
processing?  unidirectional  processinc? 
feedback?  backtrack?  planning? 

(18)  Cost 

$0.001 /sec.  ol  speech?  $0.01/s?  $0.1/s? 
$1.0/s? 

(19)  Operational  Date 


ARPA  Specifications  tor  1 976  Systems 
The  system  should: 

(1)  eccept  connected  speech 

(2)  from  many 

(3)  cooperative  speakers  ol  the  "general  American 
dialect", 

(4)  in  a quiet  room 

(5)  over  a good  quality  microphone 

(6)  allowing  slight  tuning  of  the  system  per  speaker, 

(7)  but  requiring  only  natural  adaptation  by  the 
user, 

(8)  permitting  a slightly  selected  vocabulary  of 
1,000  words, 

(9)  with  a highly  artificial  syntax. 

(10)  end  a task  with  a constrained  and  fairly  simple 
semantics. 

(11)  with  a simple  psychological  model  of  the  user, 

(12)  providing  graceful  interaction, 

(13)  tolerating  less  than  lOVo  semantic  error, 

(14)  In  a few  times  real-time, 

(15) 

(16) 

(17) 

(18) 

(19)  and  have  a prototype  demonstrable  in  1976. 


Figure  1:  Dimensions  of  Speech  Understanding  Systems  and  ARPA  Specifications  for  1976. 

(After  Newell  et  al.  [71].) 


then  manipulate  and  study  it.  In  order  to  provide  an 
environment  to  accomplish  this,  a two-level  ap- 
proach is  taken:  First,  a basic  set  of  facilities  is 
provided,  and,  second,  various  configurations  are 
built  using  these  facilities.  These  facilities,  which 
together  are  called  the  kernel,  form  a problem- 
dependent  programming  system  for  building  and 
experimenting  with  particular  configurations.  The 
correct  choice  of  kernel  facilities  and  their  im- 
plementation are  crucial  ingredients  in  developing  a 
system. 

The  Blackboard— Representation  of  Knowledge 

The  requirement  that  KSs  be  independent  implies 
that  the  functioning  (and  very  existence)  of  each 
must  not  be  necessary  or  crucial  to  the  others.  On 
the  other  hand,  the  KSs  are  required  to  cooperate  in 
the  iterative  guess-building,  using  and  correcting 
one  another’s  guesses;  this  implies  that  there  must 
be  interaction  among  the  processes.  These  two  op- 
posing requirements  have  led  to  a design  in  which 
each  KS  interfaces  to  the  others  externally  in  a un- 
iform way  that  is  identical  across  KSs  and  in  which 
no  knowledge  source  knows  what  or  how  many  other 
KSs  exist.  The  interface  is  implemented  as  a 
dynamic  global  data  structure,  called  the 
blackboard.'^  The  primary  units  in  the  blackboard 
are  the  guesses  about  particular  aspects  of  the 
problem— the  hypotheses.  At  any  time,  the 
blackboard  holds  the  current  state  of  the  system;  it 
contains  all  the  guesses  about  the  problem  that 
exist.  Subsets  of  hypotheses  represent  partial 
solutions  to  the  entire  problem;  these  may  compete 
with  the  partial  solutions  represented  by  other 
(perhaps  overlapping)  subsets. 

Each  KS  may  access  information  in  the 
blackboard.  Each  may  add  information  to  the 
blackboard  by  creating  (or  deleting)  hypotheses,  by 
modifying  existing  hypotheses,  and  by  establishing 
or  modifying  explicit  structural  relationships  among 
hypotheses.  The  generation  and  modification  of 
globally  accessible  hypotheses  is  the  exclusive 
means  of  communication  among  the  diverse  KSs. 
This  mechanism  of  cooperation,  which  is  an  im- 
plementation of  the  hypothesize-and-test  paradigm, 
allows  a KS  to  contribute  knowledge  without  being 
aware  of  which  other  KSs  will  use  the  information  or 
which  KS  supplied  the  information  that  it  used.  It  is  in 
this  way  that  KSs  are  made  independent  and 

3 The  term  "blackboard"  was  used  by  Simon  [66]  in 
describing  a mechanism  in  long-term  memory  as 
part  of  a theory  of  the  psychology  of  problem- 
solving. Simon  |71j  further  develops  this  concept 
and  elaborates  its  uses  in  the  context  of  an  ab- 
stract model  for  problem-solving. 


separable.  The  structural  relationships  form  a 
network  of  the  hypotheses  and  are  used  to  represent 
the  deductions  and  inferences  which  caused  a KS  to 
generate  one  hypothesis  from  others.  The  explicit 
retention  in  the  blackboard  of  these  dependency 
relationships  is  used  to  hold,  among  other  things, 
competing  hypotheses. 

Because  of  the  central  importance  of  the 
blackboard,  its  design  (i.e.,  the  design  of  the  struc- 
ture of  hypotheses  and  their  relationships)  is  crucial. 
This  is  usually  called  the  problem  of  representation. 

Activation  of  Knowledge  Sources— Focus  of  Atten- 
tion 

An  action  of  a KS  in  the  blackboard  takes  place  in 
the  context  of  some  hypotheses  already  existing  in 
the  blackboard.  For  example,  a KS  which 
hypothesizes  words  may  require  a stressed  vowel 
(as  well  as  some  surrounding  sounds)  as  its  context 
in  order  tc  consider  generating  new  word 
hypotheses. 

At  any  time  there  may  be  many  different  contexts 
which  satisfy  the  needs  of  one  or  more  KSs.  The 
problem  of  choosing  the  order  for  activating  KSs  on 
contexts  is  generally  called  the  problem  of  control 
flow.  Because  there  may  be  many  such  possible  ac- 
tivations and  because  each  activation  of  a KS  will,  in 
general,  create  the  potential  for  even  more  ac- 
tivations (e  g.,  the  word  hypothesizer,  given  a single 
new  stressed  vowel  context,  might  hypothesize  five 
new  words  as  competing  candidates— each  of  these 
might  provide  a new  context  for  a syntactic  parser), 
the  number  of  possible  activations  may  grow. 

If  very,  very  large  amounts  of  processing  power 
(and  memory)  were  available,  one  could  consider 
actually  activating  all  KSs  in  all  their  possible  con- 
texts. This  would  expand  the  blackboard  with  many 
(competing)  hypotheses.  Assuming  this  would  even- 
tually terminate  (I.e.,  at  some  point  no  new  contexts 
are  created),  a decision  process  could  then  try  to 
pick  from  all  the  competing  hypotheses  that  subset 
which  best  describes  the  data— this  would  be  the 
system's  "solution"  to  the  problem.  Because  of  this 
combinatoric  explosion  of  possibilities  (caused 
mostly  by  the  problems  of  variability  and  in- 
completeness in  the  signal  and  errorfulness  of  the 
KSs),  this  complete  expansion  is  not  feasible. 
Therefore,  the  control  strategy  can  pick  only  a small 
subset  of  the  applicable  KS  activations;  this  can  be 
thought  of  as  exploring  a limited  portion  of  the 
(potential)  fully-expanded  blackboard.  The  problem 
of  choosing  a control  strategy  which  can  efficiently 
reach  the  correct  set  of  hypotheses  is  called  the 
attention-focusing  problem.  Its  solution  is  also 
critical  for  the  success  of  a system;  If  portions  of  the 
correct  solution  are  pruned,  the  solution  will  never 


be  found;  if  many  incorrect  portions  are  not  pruned, 
the  combinatoric  explosion  will  use  large  amounts  of 
computing  resources  (and  may  also  force  the 
system  to  give  up  before  reaching  the  solution). 

The  problems  of  representation  of  knowledge  and 
searching  a large  solution  space  (focus  of  attention) 
are  two  of  the  central  problems  of  artificial  in- 
telligence. The  speech  understanding  problem,  with 
its  requirements  of  high  performance  and  the  use  of 
diverse  and  errorful  KSs,  provides  a rich  field  for 
their  study 

4.  Overview  of  Hearsayl 

The  blackboard  of  Hearsayl  consists  of  partial 
sentence  hypotheses,  each  of  which  is  a sequence  of 
words  with  non-overlapping  time  locations  in  the 
utterance.  Each  is  a partial  sentence  hypothesis 
because  not  all  of  the  utterance  need  be  described 
by  the  given  sequence  of  words.  In  particular,  gaps 
of  one  or  more  words  of  the  utterance  which  have  not 
yet  been  hypothesized  (in  the  context  of  the  par- 
ticular sentence  hypothesis)  are  designated  by 
"filler”  words.  The  partial  sentence  hypotheses  also 
contain  confidence  ratings  for  each  word  hypothesis 
and  a composite  rating  for  the  overall  sequence  of 
words.  A sentence  hypothesis  is  the  focal  point  that 
is  used  to  invoke  a KS.  The  sentence  hypothesis  also 
contains  the  accumulation  of  all  information  that  all 
KSs  have  contributed  to  that  hypothesis. 

System  activity  goes  through  a number  of  cycles. 
In  each  cycle  there  is  one  partial  sentence 
hypothesis  on  the  blackboard  which  is  the  focal 
point  of  activity;  this  focal  hypothesis  forms  the  con- 
text for  KS  activity  during  the  cycle  KSs  are  ac- 
tivated in  a lockstep  sequence  consisting  of  three 
phrases  per  cycle:  poll,  hypothesize,  and  test.  At 
each  phase,  all  KSs  are  activated  for  that  phase,  and 
the  next  phase  does  not  commence  until  all  KSs 
have  completed  the  current  one.  The  poll  phase  in- 
volves determining  which  KSs  have  something  to 
contribute  to  the  focal  sentence  hypothesis;  polling 
also  determines  how  confident  each  KS  is  about  its 
proposed  contributions.  The  hypothesize  phase 
consists  of  activating  the  KS  showing  the  most  con- 
fidence about  its  proposed  contribution  of  informa- 
tion, This  KS  then  hypothesizes  a set  of  possible 
words  (option  words)  for  some  (one)  "filler ' word  in 
the  speech  utterance.  The  testing  phase  consists  of 
each  KS  evaluating  (verifying)  the  possible  option 
words  with  respect  to  the  given  context.  After  all  KSs 
have  completed  their  verifications,  the  option  words 
which  seem  most  likely,  based  on  the  combined 
ratings  of  ali  the  KSs,  are  then  osed  to  construct  new 
partial  sentence  hypotheses.  The  blackboard  is  then 
re-evaluated  to  find  the  most  promising  sentence 


hypothesis;  this  hypothesis  then  becomes  the  focal 
point  for  the  next  hypothesize-and-test  cycle. 

A mediator  module  (the  "recognition  overlord") 
is  responsible  for  maintaining  the  blackboard,  cal- 
culating combined  ratings  from  the  rating.',  assigned 
to  hypotheses  by  the  Individual  KSs,  anc  deciding 
when  to  stop  and  accept  a solution  (or  give  up).  The 
rating  of  the  sentence  hypotheses  is  the  mechanism 
for  attention  focusing  A best-first  strategy  is  used— 
the  currently  highest  rated  hypothesis  is  the  one 
used  as  the  context  for  the  next  cycle.  If  an  erroi  is 
made,  the  rating  of  the  incorrect  hypothesis  will, 
hopefully,  eventually  degrade  and  attention  will  be 
focused  to  the  sentence  hypothesis  which  now  has 
the  highest  rating. 

Hearsayl  contains  three  KSs: 

1)  The  acoustic-phonetic  KS  deals  with  the  sounds 
of  the  words  of  the  input  language  and  how  they 
relate  to  the  speech  signal.  It  obtains  (from  a pre- 
processing module  called  EAR)  a representation 
of  the  speech  signal  as  an  errorful  (or  course!) 
sequence  of  segments,  each  segment  being 
labeled  with  a phonetic-like  iabei.  The  input  lan- 
guage is  specified  to  the  KS  as  a lexicon  of  words 
in  which  each  word  is  "spelled"  as  a sequence  of 
phonemic  symbols  (with  some  alternative  spel- 
lings). The  KS  both  hypothesizes  words  (from  the 
segments)  and  evaluates  the  word  hypotheses  of 
other  KSs. 

2)  The  syntax  KS  deals  with  the  orderings  of  words 
in  the  utterance  according  to  the  specified  gram- 
mar of  the  input  language.  This  grammar  is  speci- 
fied to  the  KS  in  BNF  notation.  Given  some  con- 
tiguous word  hypotheses,  the  KS  can  evaluate 
them  for  consistency  with  the  grammar  and  also 
can  hypothesize  additional  words  which  are  likely 
to  occur  contiguous  to  them. 

3)  The  semantics  KS  deals  with  the  meaning  of 
words  and  phrases  of  the  input  language,  in  the 
context  of  the  task.  Only  one  task  semantics  KS 
has  been  programmeo  (for  “Voice-chess"— play- 
ing a game  of  chess  verbally);  its  design  is  highly 
explicit  to  the  one  task.  This  KS  hypothesizes  and 
rates  sentences  and  porti  jns  of  sentences  based 
on  the  chess  moves  they  represent;  it  uses  both 
the  legality  of  the  move  in  the  current  chess  boaro 
position  as  well  as  the  "goodness"  of  the  move  (as 
determined  by  a chess-playing  pregram  which  the 
KS  consults). 

Hearsayl  Performance 

The  Hearsayl  system  first  demonstrated  live, 
connected-speech  recognition  in  J,jne,  1972,  at  a 
workshop  held  at  CMU.  Since  that  time,  about  two 
person-years  have  been  spent  in  studying  it  and  in 


improving  its  performance.  The  system  has  been 
formally  tested  on  a set  of  144  connected  speech 
utterances,  containing  676  \word  tokens,  spoken  by 
five  speakers,  and  consisting  of  four  tasks  (only  one 
of  w/hich  has  had  a semantics  component 
programmed),  w/ith  vocabularies  ranging  from  28  to 
76  woros.  The  system  locates  and  correctly  identifies 
about  93%  of  the  words,  using  all  three  of  its  KSs. 
Without  the  use  of  the  semantics  KS,  the  accuracy 
decreases  to  70%.  It  decreases  further  to  about  30% 
when  neither  syntax  nor  semantics  are  used  Hear- 
sayl  operates  in  about  7 to  10  times  real-time  on  a 
PDP10-KA10  (0.3  million  instructions/sec.  machine), 
using  about  120K  words  (36-bits/word)  for  storage 
and  programs. 

Hearsayl  Design  Limitations 

There  are  four  major  design  decisions  in  the  Hear- 
say! implementation  of  knowledge  representation 
and  cooperation  which  make  it  difficult  to  directly  ex- 
tend Hearsayl  to  more  ambitious  performance  goals. 

The  first,  and  most  important,  of  these  limiting 
decisions  concerns  the  use  of  the  hypothesize-and- 
test  paradigm.  As  implemented  in  Hearsayl,  the 
paradigm  is  exploited  only  at  the  word  level.  That  is, 
the  Information  content  of  any  hypothesis  in  the 
blackboard  is  limited  to  a description  at  the  word 
level.  The  addition  of  non-word  level  KSs  (i.e  , KSs 
cooperating  via  either  sub-word  levels,  su-.h  as 
syllables  or  phones,  or  via  supra-word  levels,  such 
as  phrases  or  concepts)  thus  becomes  cumbersome 
because  this  knowledge  must  somehow  be  related 
to  hypothesizing  and  testing  at  the  word  level. 

Secondly,  Hearsayl  constrains  the  hypothesize- 
and-test  paradigm  to  operate  in  a lockstep  control 
sequence.  The  effect  of  this  decision  is  to  limit 
parallelism  of  execution  (and  thus  reduce  effec- 
tiveness on  a multi-processor  configuration);  this  is 
because  the  time  required  to  complete  a 
hypothesize-and-test  cycle  is  the  maximum  time 
required  by  any  single  hypothesizer  KS  plus  the 
maximu.n  time  required  by  any  single  verifier 
(testing)  KS.  Another  disadvantage  of  this  control 
scheme  is  that  the  time  increases  for  the  system  to 
refocus  attention,  because  there  is  no  provision  for 
any  communication  of  partial  results  among  KSs. 
Thus,  for  example,  a rejection  of  a particular  option 
word  by  a KS  will  not  be  noticed  until  ail  the  KSs 
have  tested  all  the  option  words. 

A third  weakness  in  the  Hearsayl  Implementation 
concerns  the  structure  of  the  blackboard:  there  is  no 
provision  for  specifying  relationships  among  alter- 
native sentence  hypotheses.  This  absence  has  the 
effect  of  Increasing  the  overall  computation  time  and 
increasing  the  time  to  refocus  attention,  because  the 
information  gained  by  working  on  one  hypothesis 


cannot  be  shared  by  propagating  it  to  other  relevant 
hypotheses. 

A fourth  limiting  design  derision  relates  to  how  a 
global  problem-solving  strategy  is  implemented  in 
Hearsayl.  The  policies  for  attention-focusing  and 
control  are  embeuded  in  the  recognition  overlord 
module  in  an  ad  hoc  fashion— there  is  no  coherent 
structure  for  the  algorithms  and  they  are  "wired  in” 
to  the  kernel  of  the  system,  rather  than  being 
available  for  easy  manipulation  and  experimenta- 
tion. Thus  it  is  awkward  to  modify  and  evaluate 
policy  algorithms. 

5.  Overview  of  Hearsayll 

Hearsayll  represents  the  step  following  Hearsayl 
in  the  sequence  of  increasingly  ambitious  systems 
for  speech  understanding.  The  major  changes  to  the 
system  structure  are  a)  in  the  representation  of 
knowledge  in  the  blackboard  and  b)  in  the  manner  of 
activation  and  attention-for  using  of  KSs. 

The  Blackboard  of  Hearsayll 
The  blackboard  has  been  extended  and 
generalized  to  allow  a)  the  representation  of  all  levels 
of  information  (acoustic,  phonetic,  syllabic,  etc.)  in 
addition  to  the  lexical  and  sentence  levels  of  Hear- 
sayl and  b)  the  explicit  representation  of 
relationships  among  hypotheses. 

The  blackboard  is  partitioned  into  distinct  infor- 
mation levels,  each  level  is  used  to  hold  a different 
(and  potentially  complete)  representation  of  the 
utterance.  Associated  with  each  level  is  a set  of 
primitive  elements  appropriate  for  representing  the 
problem  at  that  level.  (For  example,  the  elements  at 
the  lexical  level  are  the  words  of  the  vocabulary  to  be 
recognized,  while  the  elements  at  the  phonetic  level 
are  the  phones  of  English.)  Each  hypothesis  exists  at 
a particular  level  and  is  labeled  as  being  a p articular 
element  of  the  set  of  primitive  elements  at  that  level. 
The  choice  of  levels  (and  the  set  of  elements  at  each 
level)  is  not  prespecified  by  the  kernel  of  the 
system.  To  the  kernel,  all  levels  are  uniform;  so  new 
ones  can  be  added  at  any  time.  The  configuration  of 
levels  that  is  cur  ently  in  use  is  shown  in  Figure  2.* 
Parametric  Leve'— The  parametric  level  holds  the 
most  basic  representation  of  the  utterance  that  the 
system  has;  it  is  the  only  direct  input  to  the 
machine  about  the  acoustic  signal.  Several  dif- 
ferent sets  of  parameters  are  being  used  In  Hear- 
sayll interchangeably;  1 /3-octave  filter-band 
energies  measured  every  10  msec.,  LPC-derived 
vocal-tract  parameters,  and  wide-band  energies 
and  zero-crossing  counts. 


4 An  elaboration  of  the  following  description  can  be 
found  in  Shockey  and  Erman  [74], 


Segmental  Lei/e/— This  level  represents  the  ut- 
terance as  labeled  acoustic  segments.  Although 
the  set  of  labels  is  phonetic-like,  the  level  is  not 
intended  to  be  phonetic— the  segmentation  and 
labeling  reflect  acoustic  manifestation  and  do 
rot,  for  example,  attempt  to  compensate  for  the 
context  of  the  segments  or  attempt  to  combine 
acoustically  dissimilar  segments  into  (phonetic) 
units. 

Phonetic  Level— Al  this  level,  the  utterance  is 
represented  by  a phonetic  description.  This  is  a 
broad  phonetic  description  in  that  the  size 
(duration)  of  the  units  is  on  the  order  of  the  "size " 
of  phonemes;  it  is  a fine  phonetic  description 
to  the  extent  that  each  element  is  labeled  \«ith  a 
fairly  detailed  allophonic  classification  (eg., 
"stressed,  nasalized  [I]"). 

22  Surface-Phonemic  Leve/— This  level,  named  by 
seemingly  contradicting  terms,  represents  the 
utterance  by  phoneme-like  units,  with  the  addition 
of  modifiers,  such  as  stress  and  boundary  (word, 
morpheme,  syllable)  markings. 

Syllabic  Level— The  unit  of  representation  here  is 
the  syllable. 

Lexical  /.eve/— The  umt  of  information  at  this  level 
is  the  word. 

Phrasal  Leve/— Phrases  appear  at  this  level.  In 
fact,  since  a level  may  contain  arbitrarily  many 
"sub-levels"  of  elements  (using  "links",  as 
described  below),  traditional  kinds  of  syntactic 
trees  are  directly  represented  here. 

The  decomposition  of  the  blackboard  into  distinct 
levels  of  representation  can  also  be  thought  of  as  an 
a priori  framework  of  a plan  for  problem-solving. 
Each  level  is  a generic  stage  in  the  plan.  The  goal  at 
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Figure  2:  The  Levels  Currently  Used  In  Heereayll. 


each  level  is  to  create  and  validate  hypotheses  at 
that  level.  For  example,  the  goal  at  the  phonetic  level 
is  a phonetic  transcription  of  the  utterance.  The 
overall  goal  of  the  system  is  to  create  (using  "links", 
as  described  below)  the  most  plausible  network  of 
hypotheses  that  sufficiently  covers  the  levels. 
'Plausible  and  sufficient’  here  refer  to  the  judgment 
of  the  KSs;  'covering  the  levels’  means  a network 
that  connects  hypotheses  which  describe  the  speech 
signal  (at  the  parametric  level)  to  hypotheses  which 
describe  the  semantic  content  of  the  utterance  (at 
the  phrasal  level). 

The  decomposition  of  the  problem  space  into 
more  levels  than  in  Hearsayl  parallels  the  desire  to 
decompose  the  KSs  more  finely,  yielding  more  KSs, 
each  of  which  is  simpler  and  smaller.  The  principal 
resultant  change  in  the  configuration  of  KSs  is  that 
the  single  acoustic-phonetic  KS  of  Hearsayl  is 
decomposed  into  about  six  KSs  currently  in  Hear- 
sayll.  For  most  KSs,  the  KS  needs  to  deal  with  only 
one  or  two  levels  to  apply  its  knowledge;  it  need  not 
even  be  aware  of  the  existence  of  other  levels.  Thus, 
each  KS  can  be  made  as  simple  as  its  knowledge 
allows;  its  interface  to  the  rest  of  the  system  is  in  un- 
its and  concepts  which  are  natural  to  it.  Also,  new 
levels  can  be  added  as  new  KSs  are  designed  which 
need  to  use  them.  (For  example,  the  syllabic  level 
was  a fairly  late  addition  to  the  configuration— only 
two  KSs  needed  to  be  modified  when  it  was  added.) 

Activation  of  Knowledge  Sources 

A KS  is  instantiated  as  a knowledge-source 
process  whenever  the  blackboard  exhibits 
characteristics  which  satisfy  a "precondition”  of  the 
KS.  A precondition  of  a KS  is  a description  of  some 
partial  state  of  the  blackboard  which  defines  when 
and  where  the  KS  can  contribute  its  knowledge  by 
modifying  the  blackboard.  A KS  carries  out  these  ac- 
tions with  respect  to  a particular  context,  the  context 
being  some  arbitrary  subset  of  the  previously 
generated  hypotheses  in  the  blackboard.  Thus,  new 
hypotheses  or  modifications  to  existing  hypotheses 
are  constructed  from  the  (static)  knowledge  of  the 
KS  and  the  educated  guesses  made  at  some 
previous  time  by  another  KSs. 

The  modifications  made  by  any  given  KS  process 
are  expected  to  trigger  further  KSs  by  creating  new 
conditions  in  the  blackboard  to  which  those  KSs,  in 
turn,  respond.  The  structure  of  a hypothesis  is 
designed  to  allow  the  preconditions  of  most  KSs  to 
be  sensitive  to  a single,  simple  change  in  some 
hypothesis  (e  g.,  the  creation  of  a new  hypothesis  of 
a particular  type,  a change  of  a rating,  or  the  creation 
of  a structural  link  between  particular  kinds  of 
hypotheses).  Through  this  data-directed  interpreta- 
tion of  the  hypothesize-and-test  paradigm,  KSs  can 
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Figure  3:  The  Current  Knowledge  Sources  In  Hearsayll. 


also  exhibit  a high  degree  of  asynchronous  activity 
and  potential  parallelism.* 

As  examples  of  KSs,  Figure  3 shows  many  of  the 
current  set.  The  levels  are  indicated  by  horizontal 
lines  in  the  figure  and  are  labeled  at  the  left.  The  KSs 
are  indicated  by  arcs  connecting  levels;  the  starting 
pointfs)  of  an  arc  indicates  the  level(s)  of  major  "in- 
put" for  the  KS,  and  the  end  point  indicates  the  "out- 
put level  where  the  KS's  major  actions  occur  In 
general,  the  action  of  most  of  these  particular  KSs  is 
to  create  links  between  hypotheses  on  its  input 
evel(s)  and;  1)  existing  hypotheses  on  its  output 
level,  If  appropriate  ones  are  already  there,  or  2) 
hypotheses  that  it  creates  on  its  output  level. 

5 One  might  think  of  this  model  for  data-directed 
activation  of  KSs  as  a production  system  (Newell 
|731)  which  is  executed  asynchronously  The  pre- 
conditions correspond  to  the  left-hand  sides 
(conditions)  of  productions,  and  the  KSs  cor- 
respond to  the  right-hand  sides  (actions)  of  the 
productions.  Conceptually,  these  left-hand  sides 
are  evaluated  continuously.  When  a precondition 
IS  satisfied,  an  instantiation  of  the  corresponding 
right-hand  side  of  its  production  is  created;  this 
instantiation  is  executed  at  some  arbitrary  sub- 
sequent time  (perhaps  subject  to  Instantiation 
scheduling  constraints). 


The  Segmenler-Classifier  KS  uses  the  parametric 
description  of  the  speech  signal  to  produce  a 
labeled  acoustic  segmentation.  (See  Goldberg  et 
al.  (751  for  a description  of  the  algorithm  used.) 
For  any  portion  of  the  utterance,  several  possible 
alternative  segmentations  and  labels  may  be 
produced. 

The  Segment  Combiner  combines  similar  adja- 
cent seg-tients  into  larger  units.  It  Is  triggered  on 
each  new  hypothesis  at  the  segmental  level. 

The  Phone  Synthesizer  uses  labeled  acoustic 
segments  to  generate  elements  at  the  phonetic 
level.  This  procedure  is  sometimes  a fairly  direct 
renaming  of  an  hypothesis  at  the  segmental  level, 
perhaps  using  the  context  of  adjacent  segments. 
In  other  cases,  phone  synthesis  requires  the  com- 
bining of  several  segments  (e.g.,  the  generation 
of  [t]  from  a segment  of  silence  followed  by  a seg- 
ment of  aspiration)  or  the  insertion  of  phones  not 
indicated  directly  by  the  segmentation  (e.g.,  hypoth- 
esizing the  existence  of  an  [I]  if  a vowel  seems 
velarized  and  there  is  no  [I]  In  the  neighborhood). 
This  KS  is  triggered  whenever  a new  hypothesis  Is 
created  at  the  segmental  level. 

The  Word  Candidate  Generator  uses  ohonetic 
Information  (primarily  just  at  stressed  locations 
and  other  areas  of  high  phonetic  reliability)  to 


generate  word  hypotheses.  This  is  accomplished 
in  a two-stage  process,  with  a stop  at  the  syllabic 
level,  from  which  lexical  retrieval  is  more  effective. 
(In  fact,  there  are  really  two  separate  KSs  here— 
one  that  goes  from  phones  to  syllables,  and  one 
that  goes  from  syllables  to  words.) 

The  Phoneme  Hypothesizer  KS  is  activated  when- 
ever a word  hypothesis  is  created  (at  the  lexical 
level)  which  Is  not  yet  supported  by  hypotheses  at 
the  surface-phonemic  ievel.  Its  action  is  to  create 
one  or  more  sequences  at  the  surface-phonemic 
level  which  represent  alternative  pronunciations 
of  the  word.  (These  pronunciations  are  pre- 
specified as  entries  in  a dictionary.)  It  also  creates 
the  syllable  hypotheses  for  the  word,  if  they  do  not 
already  exist. 

The  Phone— Phoneme  Synchronizer  is  triggered 
whenever  an  hypothesis  is  created  at  either  the 
phonetic  or  the  surfacc-phonemic  level.  This  KS 
attempts  to  link  up  the  new  hypothesis  with  hypoth- 
eses at  the  other  level.  This  linking  may  be 
many-to-one  in  either  direction. 

The  Syntactic-Semantic  Parser  uses  the  syntactic 
and  semantic  definition  of  the  input  language  to 
build  parses  at  the  phrasal  level.  It  is  triggered 
by  new  word  and  phrasal  hypotheses.  This  KS  is 
not  restricted  to  left-to-right  parsing,  but  rather 
works  piecemeal  wherever  hypotheses  occur. 
One  of  its  responsibilities  is  to  identify  possible 
interpretations  for  the  entire  utterance.  (See 
Hayes-Roth  and  Mostow  [75].) 

The  Syntactic-Semantic  Hypothesizer  also  uses 
the  syntactic  and  semantic  definition  of  the  input 
language.  It  hypothesizes  phrasal  and  word 
hypotheses  which  are  likely  to  occur  adjacent 
to  phrasal  and  word  hypotheses  already  on  the 
blackboard.  This  provides  "top-down"  activity  in 
the  system. 

The  Rating  Policy  KS  operates  at  all  levels  of  the 
blackboard.  Its  function  is  to  propagate  evalua- 
tions of  hypotheses.  For  each  hypothesis,  this  KS 
calculates  ratings  which  are  based  on  a)  intrinsic 
ratings  placed  on  the  hypothesis  by  other  KSs  and 
b)  the  hypothesis'  relationships  to  other  hypoth- 


Hypotheses:  Structure  and  Interrelationships 

As  described  above,  the  structure  of  hypotheses 
at  each  level  in  the  blackboard  Is  identical  (i.e.,  the 
interpretation  of  hypotheses  at  different  levels  is  im- 
posed by  the  KSs  dealing  with  them.)  The  internal 
structure  of  an  hypothesis  consists  of  a fixed  set  of 
attributes  (i.e.,  fields  which  are  named);  this  set  is  the 
same  for  hypotheses  at  all  levels  of  representation  in 


the  blackboard.  The  values  of  the  attributes  are  set 
and  modified  by  the  KSs. 

Besides  holding  itilormation  necessary  to 
describe  the  hypothesis,  attributes  also  serve  as 
mechanisms  for  implementing  the  data-directed 
hypothesize-and-test  paradigm.  That  is,  a KS  can 
specify  particular  attributes  of  hypotheses  (usually  at 
particular  levels)  which  it  wants  to  have  monitored; 
whenever  a change  is  made  to  one  of  these 
monitored  attributes,  the  KS  (through  its  precondi- 
tion) can  be  activated  and  notified  of  the  nature  of 
the  change. 

Attributes  can  be  grouped  into  several  classes; 

— The  first  class  of  attributes  names  the  hypoth- 
esis; It  contains  the  unique  name  of  the  hypoth- 
esis, the  name  of  its  level,  and  its  label  from  the 
element  set  at  that  level. 

— One  very  important  set  of  attributes  specifies 
structural  relationships  with  other  hypotheses, 
as  described  below. 

— The  next  class  of  attributes  is  composed  of 
parameters  which  rate  the  hypothesis.  These 
include  separate  numerical  ratings  derived 
from  a)  a priori  information  about  the  hypoth- 
esis (usually  placed  on  the  hypothesis  by  its 
creator  KS),  and  b)  information  derived  from  its 
relationships  to  other  hypotheses. 

— Another  set  of  attributes  contains  information 
about  KS  attention  to  the  hypothesis.  These  in- 
clude suggestions  (by  KSs)  of  what  type  of 
further  processing  should  occur.  These  sug- 
gestions are  goals. 

— For  speech,  time  is  a fundamental  concept,  so 
the  Hearsayll  system  has  a class  of  attributes 
for  describing  the  begin-  and  end-time  and  the 
duration  of  the  event  which  the  hypothesis  re- 
presents. These  attributes  include  ways  of 
explicitly  representing  fuzzy  notions  of  the 
times.  Besides  its  descriptive  importance,  the 
time  a'tribute  class  is  used  to  partition  the 
blackboard  for  efficient  access;  e.g.,  a KS  can 
retrieve  hypotheses  which  overlap  a particular 
time  region.  Using  both  time  and  level,  a two- 
dimensional  partitioning  occurs. 

— The  capability  for  arbitrary  KS-specific  attri- 
butes is  also  included.  This  can  be  used  by  a KS 
to  hold  arbitrary  information  about  the  hypoth- 
esis; in  this  way  a KS  need  not  hold  state  in- 
formation about  the  hypothesis  internally  across 
activations  of  the  KS  and  allows,  for  example, 
the  implementation  of  generator  functions.  If 
several  KSs  share  knowledge  of  the  name  of 
one  of  these  attributes,  each  of  them  can  access 
and  modify  the  attribute's  value  and  thus  com- 
municate just  as  if  it  were  a "standard”  attribute; 
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this  can  be  used  as  an  escape  mechanism  for 
explicit  KS  intercommunication. 

— A unique  class  of  hypothesis  attributes,  called 
processing  state  attributes,  contains  succinct 
summaries  and  classifications  of  the  values  of 
the  other  attributes.  For  example,  the  values  of 
the  rating  attributes  are  summarized  and  the 
hypothesis  is  classified  as  either  "unrated  ”, 
"neutral”  (noncommittal),  "verified  ",  “guaran- 
teed” (strongly  verified  and  unique),  or  "re- 
jected”. Other  processing  state  attributes  sum- 
marize the  structural  velationch'ps  with  other 
hypotheses  and  characterise,  for  example, 
whether  the  hypothesis  has  ocen  "suftiaiently 
and  consistently”  described  as  an  abstraction 
of  hypotheses  at  lower  levels.  The  processing 
state  attributes  are  especially  useful  .or  ef- 
ficiently triggering  KSs;  for  example,  a KS  may 
specify  in  its  precondition  that  it  is  to  be 
activated  whenever  a hypothesis  at  a particular 
level  becomes  "verified”.  These  attributes  are 
also  used  for  the  goal-directed  scheduling  of 
KSs,  as  described  in  the  next  section. 

Given  a specific  hypothesis,  a KS  can  examine  the 
value  of  any  of  its  attributes.  A KS  source  also  needs 
the  ability  to  retrieve  sets  of  hypotheses  whose  at- 
tributes satisfy  conditions  in  which  the  KS  Is  in- 
tfc.ested;  e.g.,  a KS  may  want  to  find  all  hypotheses 
at  the  phonetic  level  which  are  vowels  and  which  oc- 
cur within  a paMicular  time  range.  The  system 
provides  an  associative  retrieval  search  mechanism 
for  accomplishing  this.  The  search  condition  is 
specified  by  a matching  prototype  which  is  a partial 
specification  of  the  components  of  a hypothesis. 

Structural  relationships  between  hypotheses  in 
the  blackboard  are  represented  through  the  use  of 
links',  links  provide  a means  of  specifying  contextual 
abstractions  about  the  relationships  of  hypotheses. 
A link  is  an  element  of  the  blackboard  which 
associates  two  hypotheses  as  an  ordered  pair;  one 
of  the  nodes  is  termed  the  upper  hypothesis,  and  the 
other  is  called  the  lower  hypothesis.  The  lower 
hypothesis  is  said  to  support  the  upper  hypothesis 
while  the  upper  hypothesis  is  called  a use  of  the 
lower  one;  In  general,  the  lower  hypothesis  is  at  the 
same  or  a lower  level  in  the  blackboard  than  the  up- 
per hypothesis. 

There  are  several  types  of  links,  with  the  types 
describing  various  kinds  of  relationships.  Consider 
this  structure: 


H1  IS  the  upper  hypothesis  and  H2,  H3,  and  H4  are 
the  lower  hypotheses  of  links  L1,  L2,  and  L3,  respec- 
tively. If  the  links  are  all  of  type  OR,  the  Interpretation 
is  that  H1  is  either  an  H2  or  an  H3  or  an  H4.  This  is 
one  way  that  alternative  descriptions  are  possible.  If 
the  links  in  the  figure  are  of  type  AND,  the  interpreta- 
tion is  that  all  of  the  lower  hypotheses  are  necessary 
to  support  the  existence  of  H1.  Variants  of  the  AND- 
and  OR-links  are  also  used.  An  irr.portant  one  Is  the 
SEQUENCE  link;  it  is  similar  to  *.ne  AND-link  except 
that  a contiguous  time-ordering  is  implied  on  the  set 
of  lower  hypotheses  supporting  the  upper 
hypothesis— if  the  links  in  the  figure  are  SEQUENCE 
links,  then  H4  follows  H3  which  follows  H2. 

Besides  showing  structural  relationships  between 
hypotheses  (e  g.,  that  one  hypothesis  is  composed 
of  several  other  units),  a link  Is  a statement  about  the 
degree  to  which  one  hypothesis  implies  (i.e.,  "gives 
evidence  for  the  existence  ot”)  another  hvpothesis. 
The  strength  of  the  Implication  is  held  as  attributes  of 
the  link.  The  sense  of  the  implication  may  be 
negative;  that  is,  a link  may  indicate  that  one 
hypothesis  is  evidence  for  the  /nvalidity  of  another. 
This  statement  of  implication  may  be  bidirectional, 
the  existence  of  the  upper  hypothesis  may  give 
credence  to  the  existence  of  the  lower  hypothesis 
and  vice  versa.  Finally,  these  relationships  can  be 
constructed  in  an  iterative  manner;  links  can  be 
added  between  existing  hypotheses  by  KSs  as  they 
discover  new  evidence  for  support. 

Just  as  an  hypothesis  can  have  more  than  one 
lower  link,  so  it  can  have  several  upper  links.  Each  of 
these  represents  a different  use  of  the  hypothesis; 
the  uses  may  be  competing  or  complementary.  The 
ability  to  have  multiple  uses  and  supports  of  the 
same  hypothesis,  as  opposed  to  creating  duplicates 
for  each  competing  use  and  abstraction,  serves  to 
keep  the  blackboard  compact  and  thereby  reduces 
the  combinatoric  explosion  in  the  search  space. 
Further,  since  all  the  information  about  the 
hypothesis  is  localized,  all  uses  and  supports  of  the 
hypothesis  automatically  and  immediately  share  any 
new  Information  added  to  the  hypothesis  by  any 
KSs.  As  changes  are  made  to  a hypothesis,  some  of 
its  uses  and  supports  may  conflict  with  each  other;  if 
these  conflicts  become  too  large,  a KS  can  decide  to 
resolve  them  by  either  eliminating  some  of  the  con- 
flicting attributes  or  by  splitting  the  hypothesis  into 
two  or  more  hypotheses,  each  of  which  is  more  inter- 
nally consistent. 
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Goal-Directed  Scheduling  of  Knowledge  Sources 

As  described  earlier,  the  overall  goal  of  the  system 
is  to  create  the  most  plausible  network  of 
hypotheses  that  sufficiently  spans  the  levels.  At  any 
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instant  of  time,  the  blackboard  may  contain  many  in- 
complete networks,  each  of  which  is  plausible  as  far 
as  it  goes.  Some  of  these  incomplete  networks  may 
also  share  subnetworks.  Through  KS  activity,  in- 
compiete  networks  can  be  expanded  (or  contracted) 
and  may  be  joined  together  (or  fragmented).  At  any 
time,  there  may  be  many  places  in  the  blackboard 
which  satisfy  the  (precondition)  contexts  for  the  ac- 
tivation of  particular  KSs,  The  task  of  goal-directed 
scheduling  is  that  of  deciding  which  of  these  sites 
should  be  allocated  computing  resources. 

Several  of  the  attribute  classes  of  a hypothesis 
can  be  helpful  in  making  scheduling  decisions.  Par- 
ticularly valuable  are  the  values  of  the  attention  at- 
tributes. which,  as  described  earlier,  are  indicators 
telling  how  much  compulation  has  been  expended 
on  the  hypothesis  and  suggestions  by  KSs  of  how 
desi-able  it  is  to  devote  further  effort  on  the 
hypothesis  (along  with  the  kinds  of  processing  that 
are  desirable).  The  processing-state  attributes  and 
the  ratings  are  also  valuable  for  making  scheduling 
decisions. 

The  Implementation  of  the  goal-directed 
scheduling  strategy  is  separated  from  the  actions  of 
individual  KSs.  That  is,  the  decision  of  whether  a KS 
can  contribute  in  a particular  context  is  local  to  the 
KS,  while  the  assignment  of  that  KS  to  one  of  the 
many  contexts  on  which  it  can  possibly  operate  is 
made  more  globally.  The  three  aspects  of  a) 
decoupling  of  focusing  strategy  from  KS  activity,  b) 
decoupling  of  the  data  environment  (blackboard) 
from  the  control  flow  (KS  activation),  and  c)  the 
limited  context  in  which  a KS  operates,  together  per- 
mit a quick  refocusing  of  attention  of  KSs.  The  ability 
to  refocus  quickly  is  very  important  because  the 
errorful  nature  of  the  KS  activity  leads  to  many  in- 
complete and  possibly  contradictory  hypothesis 
networks;  thus,  as  soon  as  possible  after  a network 
no  longer  seems  promising,  the  resources  of  the 
system  should  be  employed  elsewhere. 

Implementation  and  Current  Status 

Hearsay  II  is  implemented  (as  was  Hea/sayl)  on 
the  PDP10  in  SAIL  (VanLehn  [73]),  an  extended 
Algol-60.  A number  of  language  mechanisms— par- 
ticularly the  fir  xible  macro  facility— are  used  to  ex- 
tend the  language  to  include  the  kernel  of  the  Hear- 
sayll  system;  the  result  is  a problem-oriented 
programming  system  for  writing  KSs  and  exploring 
various  configurations.  The  major  facilities  provided 
include; 

— KS  definition  facilities, 

— blackboard  accessing  routines — both  direct 
and  associative  retrieval, 

— blackboard  modification  routines, 


— a scheduler  which  activates  KSs, 

— an  overlay  facility  which  extends  the  256K-word 
address  space  so  that  large  configurations  can 
be  used, 

— blackboard  monitoring  and  tracing  facilities, 

— general-purpose  tools  for  experimenter  inter- 
action with  KSs,  including  breakpoints,  execu- 
tion tracing,  examination  and  modification  of 
variables,  and  execution  of  functions  of  the  KS, 

— tools  for  building  high-level  debugging  and 
interactive  features  that  are  KS-specific, 

— a package  for  graphical  output  of  blackboard 
structures, 

— a timing  package  for  determining  execution 
costs,  and 

— a means  of  reading  ’‘cliche"  files— stored 
sequences  of  commands  used  for  configuring 
and  controlling  the  system. 

The  system  that  results  is  highly  structured  and  has 
many  conventions  to  ;hich  the  participating  re- 
searchers must  adhere.  This  is  necessary  in  order 
to  maintain  a system  that  many  people  ere  modifying 
and  using  concurrently.  (There  are  currently  about 
five  people  maintaining  and  modifying  the  kernel 
and  approximately  a dozen  others  experimenting 
with  various  KS  configurations— a usable  and  up-to- 
date  system  must  be  operational  at  all  times.) 

The  kernel  has  been  operational  since  spring, 
1974,  and  has  gone  through  several  major  im- 
plementation iterations.  All  the  KSs  described  above 
are  operational;  several  of  them  represent  second  or 
third  generation  versions.  Because  the  overlay  facili- 
ty has  only  just  come  up  (summer,  1975),  perfor- 
mance of  the  system  as  a whole  is  still  unknown;  the 
KSs  have  been  developed  using  small  con- 
figurations at  a time.  It  is  expected  that  preliminary 
over-all  performance  information  will  be  available  by 
the  end  of  1975,  but  development  will  continue  over 
the  foreseeable  future— as  long  as  progress  con- 
tinues to  be  made. 

Although  Hearsayll  is  running  on  a uni-processor, 
it  is  implemented  using  multiple  processes.  The 
asynchronous  nature  of  KS  activation  raises  a 
number  of  issues  related  to  interaction  on  the 
blackboard.  In  particular,  because  the  execution  of  a 
KS  may  be  delayed  for  an  arbitrary  period  following 
the  blackboard  modification  which  triggered  the  KS, 
it  is  possible  that  intervening  actions  (of  other  i'Ss) 
may  have  invalidated  its  triggering  conditions  by  the 
time  that  it  actually  executes.  Mechanisms  have 
oeen  developed  to  handle  these  problems.  This 
aspect  of  the  research  is  described  in  Lesser, 
Fennell,  Erman,  and  Reddy  [74],  Fennell  [75],  and 
Fennell  and  Lesser  [75j. 

The  Hearsayll  system  also  contains  facilities  for 


simulating  Its  execution  on  a multi-processor 
machine.  Here  the  Issues  of  process  interference, 
resource  locking  (and  process  deadlock),  and 
processor  utilization  are  met.  The  papers  referenced 
in  the  preceding  paragraph  also  describe  these 
aspects  in  detail.  The  simulations,  using  just  a subset 
of  the  current  KSs®,  indicate  that  Hearsayll  can  effec- 
tively utilize  as  many  as  twelve  processors,  with  even 
more  likely  as  the  other  KSs  are  added  and  as  the 
scheduler  is  improved  to  reduce  conflicts. 

A preliminary  implementation  of  the  Hearsayll 
kernel  has  been  carried  out  on  C.mmp  (CMU's  multi- 
mini-processor). This  has  validated  the  multi- 
processing design  of  the  system.  This  implementa- 
tion has  been  accomplished  using  the  L*  system 
(Newell  and  Robertson  [75]).  Much  of  the  further  in- 
vestigation of  Hearsayll  will  take  place  in  this  con- 
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6 Only  a subset  of  five  KSs  was  used  for  these 
simulations  because  a)  the  overhead  of  simulation 
is  very  high  and  b)  when  the  simulations  began 
many  of  the  current  KSs  either  did  not  exist  or 
were  too  undeveloped  to  use. 
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Strict  Lower  and  Upper  Bounds 
on  Iterative  Computational  Complexity 

Joseph  F.  Traub  and  Henryk  Woiniakowski 

1.  Introduction 

Complexity  is  a measure  of  cost.  The  relevant 
costs  depend  on  the  model  under  analysis.  The 
costs  may  be  taken  as  units  of  time  (in  parallel  com- 
putation), number  of  comparisons  (in  sorting  algor- 
ithms), size  jf  storage  (in  large  linear  systems),  or 
number  of  arithmetics  (in  matrix  multiplication). 
Of  course  a number  of  different  costs  may  be  rele- 
vant to  a model. 

One  can  analyze  the  complexity  of  an  algorithm,  of 
a class  of  algorithms,  or  of  a problem.  The  subject 
dealing  with  the  complexity  of  an  algorithm  is  usually 
called  "Analysis  of  Algorithms.”  The  subject  dealing 
with  the  analysis  of  a class  of  algorithms  or  of  a prob- 
lem is  called  computational  complexity. 

Computational  complexity  comes  in  many  flavors 
depending  on  the  class  of  algorithms,  the  problem, 
ai.d  the  costs.  We  limit  ourselves  here  to  mentioning 
three  types  of  computational  complexity.  In  each  of 
these  the  costs  are  taken  as  the  arithmetic  opera- 
tions. Algebraic  computational  complexity  deals 
with  a problem  and  a class  of  algorithms  which  solve 
the  problems  at  finite  cost.  Typically  the  problem 
belongs  to  a class  of  problems  which  is  indexed  by 
an  integer  n.  Let  comp(P^)  be  the  complexity  of 
solving  the  nth  problem  in  the  class.  We  are  inter- 
ested in  lower  bounds  L(P^)  and  upper  bounds 
U(P^)  on  comp(P^) , 

(1.1)  L(P^)  < comp(P^)  < U(P„). 

The  ucper  bounds  are  obtained  by  exhibiting  an 
algorithm  for  solving  P^  with  complexity  U(Pp). 
Lower  bounds  are  obtained  by  theoretical  con- 


siderations and  “non-trivial"  lower  bounds  are  dif- 
ficult to  obtain.  For  example  if  P^  is  the  problem  of 
multiplying  two  n by  n matrices  and  if  the  cost  of 
each  arithmetic  operation  is  taken  as  unity  then 

0(n2)  < comp(P|^)  £ 0(n^^),  d = Ig  7. 

(We  use  Ig  to  represent  log2.)  Borodin  and  Munro 
[75]  survey  the  state  of  knowledge  in  algebraic  com- 
plexity. 

Exact  solutions  of  "most"  problems  in  science, 
engineering,  and  applied  mathematics  cannot  be 
obtained  with  finite  cost  even  if  Infinite-precision 
arithmetic  is  assumed.  Indeed  linear  problems  and 
evaluation  of  rational  functions  which  can  be  solved 
at  finite  cost  are  the  exception.  Even  when  the 
problem  can  be  solved  rationally,  we  may  choose  to 
solve  it  by  iteration.  An  example  is  the  solution  of 
large  sparse  linear  systems.  Typically,  non-linear 
problems  cannot  be  solved  at  finite  cost. 

We  call  the  branch  of  complexity  theory  that  deals 
with  non-finite  cost  problems  analytic  computational 
complexity.  Often  the  algorithms  are  iterative  and  we 
then  refer  to  iterative  computational  complexity. 
See  Traub  |75]  for  papers  presented  at  a CMU  Con- 
ference on  Analytic  Computational  Complexity. 

In  this  paper  we  propose  a new  methodology  for 
iterative  computational  complexity.  Our  aim  is  to 
create  at  least  a partial  synthesis  between  iterative 
complexity  and  other  types  of  complexity. 

A basic  quantity  in  iterative  complexity  has  been 
the  efficiency  index  of  an  algorithm  or  class  of 
algorithms.  In  this  paper  we  introduce  a new 
quantity,  the  complexity  index,  which  Is  the  recip- 
rocal of  the  efficiency  index.  The  complexity  index 
is  directly  proportional  to  the  complexity  of  an  algo- 
rithm or  class  of  algorithms.  We  show  under  what 
conditions  the  complexity  Index  is  a good  measure 
of  complexity.  Our  methodology  is  non-asymptotic 
in  the  number  of  iterations.  Earlier  analyses  of  com- 
plexity applied  only  as  the  number  of  iterations  went 
to  infinity  and  this  is  not  of  course  realistic  in 
practice. 

We  summarize  the  contents  of  this  paper.  In  Sec- 
tion 2 we  analyze  a similified  model  of  the  errors  of 
an  iterative  process  and  show  that  complexity  is  the 
product  of  two  factors,  the  complexity  index  and  the 
error  coefficient  function.  Bounds  on  the  error  coef- 
ficient function  are  derived  in  the  following  Section 
and  used  to  derive  rigorous  conditions  for  com- 
paring tire  complexity  of  two  different  algorithms.  In 
Section  4 we  show  how  the  results  of  the  simple 
model  can  be  applied  to  a realistic  model  of  one- 
point  iteration.  Lower  and  upper  bounds  on  the  com- 
plexity index  for  several  important  classes  of  itera- 
tions appear  in  Section  5.  In  a short  concluding  Sec- 
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tion  we  state  the  extensions  and  generalizations  to 
be  reported  in  future  papers. 

2.  Basic  Concepts 

We  analyze  algorithms  for  the  following  problem. 
Let  f be  a non-linear  real  or  complex  scalar  function 
with  a simple  zero  u.  Let  Xq  be  given  and  let  an 
algorithm  <p  generate  a sequence  of  approximations 

X.J X|^  to  a.  We  terminate  the  aigorithm  when  x^  is 

a sufficiently  good  approximation  to  <».  This  will  be 
made  precise  below. 

The  appropriate  setting  for  this  investigation  is  to 
consider  f as  a non-linear  operator  on  a Banach 
space  of  finite  or  infinite  dimension.  Since  many  of 
the  basic  ideas  can  be  illustrated  when  f is  a non- 
'mear  scalar  function  we  shall  assume  throughout 
this  paper  that  this  holds.  We  must  remark  however 
that  some  of  the  most  interesting  and  important 
results  deal  with  the  dependence  of  complexity  on 
problem  dimension  and  we  do  not  deal  with  that 
dependence  here. 

Let  ej  > 0 represent  some  measure  of  the  error 
of  X|.  For  example,  e^-  might  represent 


I VM 


the  absolute  error 


1^ 


I,  the  relative  error 


I f(X|)  |,  the  residual. 

Assume  that  the  C|  satisfy  the  error  equation 
(2.1)  Oj  = A^eP  p>  1,  I = 1.2 k. 


We  call  p the  non-asymptotic  order  and  Aj  the  error 
coefficient.  We  require  0 < L S Aj  < U < °°  for  all 
values  of  Oq  including  the  possibility  that  Cq  be  arbi- 
trarily small.  Then  p Is  unique.  Many  iterations  satisfy 
the  model  given  by  (2.1).  In  Section  6 we  mention 
extensions  to  this  model. 

EXAMPLE  2.1.  Let  the  algorithm  be  Newton-Raph- 
son  iteration  and  let  e|  denote  the  absolute  error. 
Then 


p = 2.  A,  = 


2f'(x.) 


where  r).^  is  In  the  interval  spanned  by  a and  Xj. 

We  simplify  the  model  ot  f2  1 ) and  show  what  kind 
of  results  may  then  be  obtained.  In  Section  4 we 
return  to  the  analysis  of  (2.1).  Let 


We  call  this  the  constant  error  coefficient  model 
while  (2.1)  is  the  variable  error  coefficient  model. 

We  consider  first  the  case  p > 1.  It  is  easy  to  verify 
that 


(2.3)  e^  = eo(^) 
where 

(2.4)  w 


P''\  i = 0 k. 


P 

AP-I  eo 

Choose)'.  0 < ('  < 1.  and  let  k be  the  smallest  index 


for  which  5 ('Oq.  Define  < 


so  that 


(2.5)  ej^  = (Oq. 

<’  is  a basic  parameter  which  measures  the  increase 
in  precision  to  be  obtained  in  the  iteration.  We 
choose  < to  avoid  ceiling  and  floor  functions  later  in 
this  paper.  It  is  convenient  to  assume  ( £ 2'^  (we  use 
this  in  Theorem  3.1)  but  this  is  non-restrictive  in 
practice. 

From  (2.3).  (2.5). 

(2.6)  < = (^)^’'‘\ 
and  it  follows  that 


(2.7)  k 


where 


g(Wp) 

ig  P ’ 


(2.8)  g(wp)  = ig  (1  + _ 


, t = lg(1A). 


This  is  independent  of  the  logarithm  base  but  it  is 
convenient  to  take  all  logarithms  to  base  2.  Then  if  e| 
is  the  relative  error,  t measures  the  number  of  bits  to 
be  gained  in  the  iteration. 

We  denote  the  complexity  of  iteration  i by  C|.  In 
this  paper  we  assume  Cj  = c is  independent  of  i.  We 
defer  a discussion  of  the  estimation  of  c U'  til  Section 
5.  The  important  case  of  variable  cost  '<ill  be  con- 
sidered In  a future  paper.  We  define  th.  complexity 
ot  the  algorithm  by 

(2.9)  comp  = ck. 

Then  from  (2.7).  (2.8), 

(2.10)  comp  = zg(Wp) 

where  we  define 
c 


(2.11)  z = 


ig  p 


(2.2)  ej  = Aef|,.|  p>  1,  i = 1 k. 


as  the  complexify  index. 


We  call  g the  error  coefficient  function.  Equation 
(2.10)  will  be  fundamental  In  our  further  analysis. 

We  have  decomposed  complexity  into  the  product 
of  two  factors.  The  complexity  index,  which  is  in- 
dependent of  both  the  error  coefficient  and  the  start- 
ing error,  is  relatively  easy  to  compute  for  any  given 
algorithm.  (However,  lowe'  bounds  on  the  com- 
plexity index  for  classes  o*  algorithms  require  upper 
bounds  on  order  which  is  a difficult  problem  only 
solved  for  special  cases  (Kung  and  Traub  [73], 
Meersman  [75]  and  Woiniakowski  [75b]).)  We  shall 
show,  in  a sense  to  be  made  precise  in  the  next  sec- 
tion, that  the  error  coefficient  is  insensitive  for  a 
large  portion  of  its  domain  and  that  complexity  is 
determined  primarily  by  the  complexity  index.  We 
shall  also  show  there  are  cases  where  complexity 
is  determined  primarily  by  the  error  coefficient 
function. 

The  complexity  index  is  the  reciprocal  of  a quan- 
tity called  the  efficiency  index  which  has  played  an 
important  role  in  iterative  complexity.  See,  for 
example,  Traub  (64,  Appendix  C],  Traub  [72], 
Paterson  [72]  and  Kung  [73a].  Since  complexity 
varies  directly  with  the  complexity  index  we  feel  that 
the  complexity  index  rather  than  the  efficiency  index, 
should  be  basic. 

We  have  been  considering  the  case  p > 1.  For 
completeness  we  write  down  the  case  p = 1-  Then 
e.  = Aej_.|,  1 = 1, 2 k and  e^=  A*^eQ=  <eQ.  Hence 


(2.12)  k = 


lg(1/A) 


comp  = 


lg(1/A) 


We  shall  not  pursue  the  case  p = 1 further  and  shall 
assume  for  the  remainder  of  this  paper  that  p > 1, 
unless  we  state  otherwise. 

3.  Bounds  on  the  Error  Coefficient  Function 

We  turn  to  an  analysis  o*  the  error  coefficient 
function  which  is  one  of  the  two  factors  which  deter- 
mines the  complexity  in  (2.10).  To  see  which  values 
of  Wp  are  of  interest,  note  that  from  (2.3),  e^  < 6q  iff 
Wp  > 1.  From  the  definition  of  k it  is  clear  that  k > 1 
and  hence  from  (2.7),  (2.8), 

Wp  < (1/<)^^^^'^^  Hence  we  assume 

(3.1)  1 < Wp  < (1/<)^^^’^’^'  = 

Generally  Wp  depends  on  p.  For  many  classes  of 
iterations 


(3.2)  aP’"'  < A < bP’"'. 


and  the  bounds  on  Wp  are  independent  of  p.  If  (3.2) 
holds  for  a class  of  iterations  0 we  shall  say  the  class  is 
normai.  An  example  of  a normal  class  of  iterations 
may  be  found  in  WozniakowskI  [75b].  To  simplify 
notation  we  shall  henceforth  write  wp  as  w whether 
or  not  we  are  dealing  with  a normal  class. 

Now,  g(w)  is  a monotonically  decreasing  function 
and 

lim  g(w)=“,  lifTi  g(w)  = 0. 

W l"^  W -CD 

To  study  the  size  of  g(w)  we  somewhat  arbitrarily 
divide  the  range  of  w,  given  by  (3.1),  into  three  sub- 
ranges. The  bounds  are  not  the  sharpest  we  can 
obtain. 

1 < w < 2.  Since  g(w)  = lg(t  -t-  Ig  w)  - Igig  w and 
0 < Ig  w £ 1,  we  conclude 

ig  t-igig  w < g(w)  < lg(l+t)-lgig  w. 

2 < w < f.  Since  g(w).,  g(2)  = lg(i+t),  g(t)  >lg  t 
-Igig  t,  we  conclude 

Ig  t-lgig  t < g(w)  < lg(1+t). 
t < w < 2'^^P'^l  ^ t.  Then 

Ig  p - g('^)  <1+13  t-igig  t. 


To  get  some  feel  for  the  length  of  these  sub- 
ranges, observe  that  if  ej  represents  relative  error 
then  in  single-precision  computation  on  a "typical' 
digital  computer  we  might  take  f = 2'^^.  Then 

nt/(p-1)_  ,32 

t = 32  and  it  p = 2,  then  2 - 2 . 


1/(aeQ)  > Wp  > 1/(beQ) 


t = 32  and  it  p = 2,  then  • 

From  the  hounds  on  the  error  coefficient  function 
and  (2.10)  we  immediately  obtain  the  following 
bounds  on  complexity. 

THEOREM  3.1.  If  1 < w < 2, 

(3.3)  z(lg  t-lgig  w)  < comp  < z(lg(l+t)-lgig  w). 

If  2 < w < t, 

(3.4)  z(lg  t-lgig  t)  ^ comp  < zlg(1  + t). 

If  t < w < 2*''(P''' ),  (with  2'^<P*‘' ) 2 t) , 

(3.5)  c < comp  < z(l  + Ig  t - Igig  t). 

We  discuss  some  of  the  implications  of  this 
Theorem.  As  w approaches  unity,  then  for  < fixed, 
comp  ~ - Zigig  W.  In  this  case  the  effect  of  the  error 
coefficient  A and  the  initial  error  Oq  cannot  be 
neglected. 


Complexity  depends  more  on  the  nearness  of  w 
to  unity  than  of  < to  zero.  To  see  this,  observe  that  if 
2 < w < t,  comp  ~ zlglg(1/<)  = comp.|  while  if  1 < 

w < 2,  comp  ~ z(lglg(1/<)  - Igig  w)  = comp  g Let 
( = 2-2*.  w-1=2-2Hn2.  Then  comp.|  = jz,  compg 
~ z(i  + 2i). 

Note  that  for  any  p > 1 the  complexity  of  an  itera- 
tion can  be  greater  than  if  p = 1 (see  (2.1 2))  provided 
w is  sufficiently  close  to  unity. 

For  any  w 2 2,  complexity  is  hounded  from  above 
by  zlg(1  +t)  and  is  therefore  independent  of  the  error 
coefficient  A and  the  initial  error  Oq.  For  w > 2.  com- 
plexity is  insensitive  tc  w and  we  need  only  crude 
bounds  on  w. 

For  2 < w < t. 

1-lglg  t/lg  t < comp/(zlg  t)  < 1+lg(l +t'"')/lg  t 
Therefore 

H-o(1)  S comp/(zlg  t)  S 1+o(1) 
and  we  conclude  that  on  ‘he  interval  [2.t]  we  have, 
for  t large,  very  tight  bounds  on  comp  with 

(3.6)  comp  ~ zigig  1/«. 

This  should  be  compared  with  the  case  p = 1 (see 
(2.12))  where  comp  varies  as  Ig  1/«. 

We  have  taken  w = 2 as  one  of  our  endpoints  for 
convenience  but  this  is  of  course  arbitrary.  Any  value 
of  w sufficiently  far  from  unity  will  do.  If  w = 2 *' 
then  g(w)  = lg(1  +ft).  Then  the  effect  of  the  nearness 
of  w to  unity  and  of » to  zero  are  equal  if  i>  = t.  that  is 
If  w = 2’’^*.  For  this  choice  of  w,  comp  = zlg(1-t-t'^) 

~ 2zlg  t = 2zlglg  1A. 

We  have  chosen  the  sub-ranges  of  w so  that  the 
endpoints  are  simple.  We  could  also  cnoose  values 
of  w that  make  the  complexity  formula  simple,  if 

w = u > 1,  then  comp  = uzlglg(1/<). 

while  if 

w = f > 1,  then  comp  = (l/v)zlglg(1/<) 

We  now  consider  the  methodology  for  comparing 
two  iterations  which  are  governed  by  the  constant 
error  coefficient  model  (2.2)  and  decrease  the  final  error 
by  the  same  «.  Let  W|.  Z|.  comp|,  i = 1.2  denote  the 
parameters  of  the  two  iterations.  Then 

comp.|  /Zi\9(Wi) 
compg  “\z^/g(w2) 

Clearly  if  z.|  < Z2  and  w.|  > W2  then  comp.|  < 
comp2-  We  obtain  bounds  oncomp.|/comp2for  sub- 
ranges of  the  Wj.  Using  the  bounds  on  complexity 
from  the  previous  theorem  we  obtain 


THEOREM  3.2.  If  1 < w.,.  Wg  ^ 2,  then 
(3.7) 

Igt-Igigw.,  comp.,  ^z.,.^^ig(l-t-t)-l.-'g Wf^ 

\Z2  Xig(i+u-igig  ^2/  compg  \ z^/vg  t-igi9  '*>2  ^ 

If  1 < Wg  S 2 £ w.|  < t,  then 


igt-lgigt  ^ comp^ 
lz^Alg(1+t)-lglg  Wg/"  comp2*^\Z2'''9 


If  2 £ W.J,  Wg  £ t,  then 


« (r!)( 


za/19  t-igig  tv  ^ ^°^Pi 
•z^A  Ig(l-ft)  ' comp2 


1 

Vz2Alg  t-lg!g  t/ 


We  discuss  some  of  the  implications  of  this 
theorem.  As  t •®,  comp,|/comp2  •z-|/Z2 
fixed  values  of  w.j,  W2-  The  ratio  z.j/Z2  has  been  the 
way  that  iterations  have  oeen  compared  (see  Traub 
(64,  Appendix  C]  where  efficiency  indices  are  used). 
Theorem  3.2  shows  that  z.,/Z2  can  be  a very  poor 
measure  of  compi/comp2;  see  for  example  (3.7). 

Finally  we  observe  that  inequalities  (3.7)-(3.9)  can 
be  rewritten  to  show  when  comp.|  < comp2  or 
compg  < comp.|.  For  example,  if  2 £ w.j,  Wg  £ L 

(3.10)  z.|  £ then  comp.,  < comp2- 

4.  The  Variable  Error  Coefficient  Model 

We  turn  to  the  variable  error  coefficient  model, 

(4.1) 6.^.,  =A,eP. 

A complete  analysis  of  this  model  is  beyond  the 
scope  of  this  paper.  Here  we  confine  ourselves  to 
the  very  simple  assumption 

(4.2)  Al  £ A.  < A^.  i = 1 k. 

i.et 


.P-1 
A|_  eo 


. Wy  = 


.P-1 
^U  ®0 


Then 

(4.3)  Z9(Wl)  ^ comp  < zg(wy) 

Note  that  Wy  < Wj_  and  therefore  (4,3)  is  compatible 
with  g being  a monotonically  decreasing  function. 
We  can  now  draw  conclusions  from  the  constant 
coefficient  model  with  A replaced  by  or  Ay. 

EXAMPLE  4.1.  Let  a be  a real  zero  and  let  J denote 
an  interval  centered  at  <v.  Assume  f does  not  vanish 
in  J and  let  J and  such  that 

min  lf'(x)l 

= 1 I < = ■> 

®0  - max|f"(x)|  ^ 

xfJ 


Then  by  Example  2.1,  for  Newton-Raphson  iteration, 
Wy  > 2 and  a priori 

(4.4)  comp  < c lg(1+t)  ■ 

The  value  of  c is  discussed  in  Section  5.  Note  that  a 
sufficient  condition  for  convergence  is 


e„  < 1/A 


U 


but  with  only  this  condition,  complexity  could  be 
extremely  large. 


EXAMPLE  4.2.  We  seek  to  calculate  a^^^,  that  <s 
sol  ye  f(x)  = x2-a.  Let  a = 2'^a2  even,  1/2  < \^  < 
2.  Then  a'>^2=  (1/2) < A < We  use 


Newton-Raphson  Iteration, 

V1=  ir) 


Then  A|  = 1/(2Xj_.,).  If  Xg  > A,  then 


A^_  =1/(2Xg)< 
Hence 

2 

'^L  ■ 1-A/x„ 


A.  < 1/(2A)  = Ay,  i - 1 k. 


2A/Xr 


W,,  = 


U 1-A/Xr 


Let  <n  = 2 


1/2 


Thenwy  > 2 and  comp  < clg(l-H). 
To  derive  a lower  bound  on  complexity  one  must 
make  an  assumption  about  the  closest  machine- 
representable  number  to  2^^^.  We  do  not  pursue 
that  here. 


5.  Bounds  on  the  Complexity  Index 

We  have  shown  that  provided  w is  not  too  close 
to  unity,  then  for  fixed  <,  complexity  depends  only 
on  the  complexity  index  z.  In  this  section  we  turn  our 
attention  to  the  complexity  ihdex. 

Recall  that  z = c/lg  p.  We  begin  our  analysis  of  z 
by  considering  the  cost  per  step,  c.  We  distinguish 
between  two  kinds  of  problems. 

We  say  a problem  Is  explicit  if  the  formula  for  f 
is  given  explicitly.  For  example,  the  calculation  of  a^' 
by  solving  f = x^-a  Is  an  explicit  problem.  The 
complexity  of  explicit  problems  has  beeh  studied  by 
Paterson  [72]  and  Kung  [73a],  [73b].  (Paterson  and 
Kung  take  the  efficlehcy  ihdex  as  ba*  'C.)  We  do  not 
treat  explicit  problems  here. 

We  say  a problem  is  implicit  if  all  we  know  about  f 
are  certain  functionals  of  f.  Classically  the  func- 
tiohals  are  f and  its  derivatives  evaluated  at  certain 
points.  These  functionals  may  be  thought  of  as  black 
boxes  which  deliver  an  output  for  any  input. 
Kacewicz  [75]  has  shown  that  integral  functionals  are 
of  interest.  The  question  of  what  functionals  may  be 
used  in  the  solution  of  a problem  are  beyond  the 
scope  of  this  paper.  We  confine  ourselves  to  implicit 
problems  for  the  remainder  of  this  paper. 

We  assume  the  same  set  of  functionals  Is  used  at 
each  step  of  the  Iteration.  The  set  of  functionals 
used  by  an  iteration  algorithm  (p  Is  called  the  in- 
formation set  N.  Woiniakowski  [75a]  gives  many 
examples  of  N.  Let  the  information  complexity  u = 
u(f,A/)be  the  cost  of  evaluating  functionals  In  the 
information  set  N and  let  the  combinatory  complexity 
d = d((f>)  be  the  cost  of  combining  functionals  (see 
Kung  and  Traub  [74b]).  We  assume  that  each 
arithmetic  operation  costs  unity  and  denote  the 
number  of  operations  for  one  evaluation  of  f(i)  by 
c(f(j)).  The  following  simple  example  may  serve  to 
illustrate  the  definition. 

EXAMPLE  5.1.  Let  <t>  be  Newton-Raphson  iteration 

Xj+1  = (/>(Xj)  = X|  - f(X|)/f'(Xj),  i = 0 k-1.  Then 

N = lf(X|),f'(Xj)|,  u(f,N)=  c(f)  + c(f'),  d(0)  = 2. 

Up  to  this  point  we  have  illustrated  the  concepts 
with  algorithms.  Computational  complexity  deals 
with  classes  of  algorithms  and  we  turn  to  our  central 
concern,  lower  and  upper  bounds  on  classes  of 
algorithms.  As  usual  the  difficult  problem  is  obtain- 
ing lower  bounds.  Good  lower  bounds  may  be 
obtained  from  good  lower  bounds  on  cost  and  good 
upper  bounds  on  order.  The  problem  of  maximal 
order  Is  a difficult  one  about  which  u great  deal  has 
been  recently  learned  (Meersman  [75],  Woinla- 
kowski  [75a],  [75b]).  Part  of  the  mathematical  dif- 
ficulty of  the  sub]ect  deals  with  the  problem  of 
maximal  order.  Note  however  that  maximal  order 
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does  not  necessarily  minimize  complexity;  we  deal 
with  this  in  a future  paper.  Upper  jounds  are 
obtained  from  algorithms.  An  interesting  question 
here  is  a good  upper  bound  on  the  combinatory 
complexity  of  a class  of  algorithms.  Brent  and  Kung 
[75]  have  obtained  a surprising  new  upper  bound, 
0(r  Ig  n),  on  the  combinatory  complexity  on  a family 
of  nth  order  one-point  iterations  based  on  inverse 
interpolation. 

It  is  convenient  to  index  our  algorithms  by  n,  the 
number  of  elements  in  the  information  set  N.  We 
illustrate  the  issues  with  two  examples. 


EXAMPLE  5.2.  Let  denote  any  one-point  itera- 
tion with  N = lf(Xj).f'(Xj) (Xj)].  Let  Cf  = 

c(f(0).  Then  u(f,A/)  ^ nCf.  For  simplicity  we  use  the 
linear  lower  bound  d(0f,)  2 n-1.  (No  non-linear 
lower  bound  is  known.)  A sharp  upper  bound  on  the 
order  of  one-point  iteration  (Traub  [64],  Kung  and 
Traub  [74a])  is  p < n.  Hence 
nCf-l-n-1 

z(0n,f) 


Ig  n 


nCf-t-n-1  3Cj+2 


ig3 


provided  only  that  Cj  2 4 (Kung  and  Traub  [74b]). 
Hence  for  any  one-point  iteration  with  Wj_  < t 
3c, -I- 2 

(5.1)  comp  2 |„  ^-(lg  t - Igig  t). 


Ig  3 


On  the  other  hand  there  jxists  a one-point  iteration 
which  uses  f,  f,  f'  and  such  that  p = 3.  Hence  if 
Wy  2 2, 

c(f)-t-c(f')+c(f 


(5.2)  comp  < 


Ig  3 


lg(1  + t). 


For  problems  such  that  c(f)  =<  c(f')  = c(f  ')  Cj  the 
lower  and  upper  bounds  of  (5.1)  and  (5.2)  are  close 
together. 


EXAMPLE  5.3.  Kung  and  Traub  [74a]  show  there 
exists  an  iteration  4'r\  which  the  information  set 
N consists  of  n evaluation  of  f with  P(^tp)  = 2^'^  and 
d(i,tn)  = (3/2)n2  -(-  (3/2)n  - 7.  Hence 


z(^n)  = 


3 2 3 

nc(f)-l-  2'^  f 1-7 


n-1 


The  complexity  index  is  minimized  (Kung  and  Traub 
[74b])  at  n*  = round[1  + (^(c(f)-4))1/2]  = o(c(f))1^2 


and 


z(\^n*)  = c(0/( 


1 -I- 


( 


(c(f)) 


1/2 


r > 0. 


It  would  only  be  reasonable  to  use  this  high  an 
order  iteration  for  very  small  <.  Assume  t » p* 


= 2I  ■'* 


Observe  that  z(i^p)  is  a very  "flat"  function  of  n. 
Thus  = (3/2)c(f)  + 11/2  and  comparing  this 
with  z(i^p.)  shows  we  can  g.ain  only  another(l/2)c(f). 
Let  denote  the  class  of  all  multipoint  iterations 


for  which  wj  2 2.  Then 


comp('l>)  < c(f)lg(1  + t)/^l 


1 -I- 


(c(f)) 


1/2 


) 


We  can  obtain  a lower  bound  on  the  complexity  of 
the  class  of  multipoint  iterations  by  using  an  upper 
bound  on  the  maximal  order  of  any  multipoint  itera- 
tion and  a lower  bound  on  the  combinatorial  com- 
plexity. Kung  and  Traub  [74a]  conjecture  that  any 
iteration  without  memory  which  uses  n pieces  of 
information  per  step  has  order  p < 21*''.  This  con- 
jecture seems  difficult  to  prove  in  general  (Woinia- 
kowski  [75b])  but  has  been  established  for  many 
important  cases  (Kung  and  Traub  [73],  Meersman 
[75],  and  Woiniakowski  [75b]). 


6.  Summary  and  Extensions  to  the  Model 

We  have  constructed  a non-asymptotic  theory  of 
iterative  computational  complexity  with  strict  lower 
and  upper  bounds.  In  order  to  make  the  complexity 
ideas  as  accessible  as  possible  we  have  limited  our- 
selves to  scalar  non-linear  problems.  The  natural 
setting  for  this  work  is  in  a Banach  space  of  finite  or 
infinite  dimension  and  we  jliall  do  our  analysis  in 


this  setting  in  a future  paper.  We  have  focused  on 
the  simplified  model  e|  = Ae[^.|  . More  realistic 
models  include  some  of  the  following  features; 


1.  Sj  = A|ej_.|  under  various  assumptions  on  the 
structure  of  A,. 


I 


►'m 


2.  Oj  A|ej_.|  ■■■Bj.pn 

model  for  iterations  with  memory. 


This  is  the  appropriate 


3.  Variable  cost  per  iteration,  c,. 


4.  Include  round-off  error.  Then  e,  will  not  converge 


to  zero. 


We  plan  to  analyze  these  more  realistic  models  in  the 
future.  We  also  intend  to  investigate  additional  basic 
properties  of  complexity.  Our  various  results  will  be 
used  to  analyze  the  complexity  of  important  prob- 
lems in  science  and  engineering. 
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The  CMU  RT-CAD  System:  An  Innovative 
Approach  to  Computer  Aided  Design 

Mario  R.  Barbacci  and  Daniel  P Siewiorek 


I.  Introduction  and  Motivation 

As  technology  has  evolved  the  primitive  com- 
ponents available  to  a digital  system  designer  have 
changed  dramatically.  Twenty-five  years  ago  the 
designer  constructed  his  systems  out  of  circuit  level 
components  such  as  resistors  and  diodes.  Sub- 
sequently switching  circuit  level  co.nponents,  as 
represented  by  gates  and  flip-flops,  became  avail- 
able as  small  scale  integration  (SSI)  components. 
With  the  introduction  of  medium  scale  integration 
(MSI)  register  transfer  level  components  appeared; 
arithmetic  and  logic  units  registers,  shift  registe-r,, 
etc.  The  advent  of  large  scale  Integration  (LSI)  h&s 
made  memories  and  even  processors  primtive  com- 
ponents from  which  .systems  are  designed.  Two 
trends  can  be  observed  from  this  technological 
evolution:  1 ) primitive  compore-lo  continue  to 
increase  in  complexity  and  2.)  the  rate  of  introduc- 
tion of  new  components  continues  to  increase. 

In  response  to  the  first  trend,  designers  have  been 
limiting  their  excursions  into  switching  circuit  level 
design  to  only  small  portions  of  the  system  (e  g.,  bus 
controllers,  etc.V  In  some  register  transfer  level 
module  sets  (Bell  [72|,  Clark  [67])  these  excursions 
have  been  completely  eliminated. 

Because  of  the  second  trend,  rapid  technology 
evolution,  there  is  a need  to  shorten  the  delay  time 
between  the  Introduction  of  a technology  and  its 
effective  use  in  new  computing  systems.  Also,  as 
technology  changes  so  does  its  cost.  The  design 
process  must,  therefore,  be  accelerated  if  the.  poten- 
tiality of  the  improving  technology  Is  to  be  realized. 


This  peper  describes  a set  of  design  programs 
developed  at  Carnegie-Mellon.  The  ultimate  goal  is 
to  minimize  the  effect  of  changing  technology  by 
building  a Computer  Aided  Design  System  that 
implements  a technology-relative  design  process. 

II.  Overview  of  the  Automatic  Design  Process 

Given  the  complexity  of  a digital  system,  design- 
ers have  sought  to  develop  automatic  means  to 
reduce  the  cost  and  time  of  the  design  process.  The 
objective  was  to  relieve  engineers  of  repetitive,  time 
consuming  tasks  such  as; 

(1) The  generation  of  detailed  design  information 
(gate  and  chip  types,  etc.) 

(2)  The  control  of  changes  in  the  design  documents 

(3)  The  checking  of  the  system  for  electrical,  logical, 
and  physical  compatibility  (fan-out  limits,  etc.) 

(4)  The  generation  of  detailed  manufacturing  in-  39 
formation  (chip  placement,  board  layout,  etc.) 

This  early  view  of  design  automation  limited  itself 
to  filling  the  gap  between  the  low-level  design 
specifications  and  the  manufacturing  data  Behav- 
ioral specifications  were  in  the  form  of  Boolean 
equations  and  the  design  programs  translated  them 
into  their  equivalent  logic  diagrams  and  wiring  list?. 

Most  of  the  synthesis  algorithms  at  this  level  dealt 
witii  the  problem  of  reducing  or  simplifying  the 
Boolean  equations  (Breuer  (72]). 

Subsequent  efforts  were  directed  towards  a sys- 
tem capable  of  accepting  a higher  level  of  behavioral 
description,  although  still  oriented  towards  r gate 
level  implementation  (Darringer  [69],  Friedman 
1691). 

Current  design  automation  effort  is  shifting  from 
implementation  in  terms  of  the  switching  circuit 
level  to  implementation  in  terms  of  the  Register 
Transfer  level.  Register  Transfer  level  simulators 
have  preceded  this  trend  by  several  years  (Dar- 
ringer [69],  Mesztenyi  [68],  Parnas  [67],  Rozenberg 
[71]).  The  closeness  of  RT  level  descriptions  to  con- 
ventional programming  accounts  for  this  early 
success.  Register  Transfer  level  descrip'ions  are 
easy  to  transliterate  into  executable  programs  in  a 
conventional  programming  language  (eg.,  FOR- 
TRAN, Algol,  etc  ),  thus  providing  inexpensive  and 
fast  simuiation  (although  in  many  cases  RT  lan- 
guages are  compiled  directly).  Register  Transfer 
level  synthesis  algorithms  have  been  less  success- 
ful. A few  programs  have  been  develo^  ed  that  take 
an  RT  level  description  as  input  and  compile  it 
directly  into  a known  set  of  RT  level  hardware 
modules  (Chartran,  AHPL).  Figure  1 depicts  a typical 
RT  design  automation  system.  The  RT  level  descrip- 
tion serves  as  input  to  several  software  modules. 

Syntax  checking  insures  a weli  formed  description. 

Static  checking  attempts  to  locate  logical  design 


Figure  1 

A Conventional  Register  Transfer  Level  Design 
Automation  System 


Figure  2 

An  Augmented  Register  Transfer  Level  Design 
Automation  System 


Figure  3 


Figure  4 

The  Design  Process  in  the  CMU  RT-CAD  System 
Level  Design  System  (June  1S7S) 


Procees  in  a RT 


errors  (such  as  deadlocks,  redundancy,  etc.).  The 
simulator  is  used  to  debug  the  design  dynamically. 
Finally,  the  description  is  cast  into  hardware  via  the 
wiring  list  generator. 

The  essential  feature  lacking  in  conventional  RT 
Design  Automation  (DA)  systems,  and  DA  systems  in 
general,  is  the  exploitation  of  alternative  implemen- 
tations derived  from  the  initial  behavioral  specifica- 
tion. Consider  the  augmented  DA  system  depicted 
in  Figure  2.  The  inputs  are  the  RT  level  description 
and  designer  given  constraints.  The  output  is  the 
specification/simulation  of  the  hardware  that  at- 
tempts to  optimize  the  system  according  to  the 
design  constraints.  By  allowing  the  description  of 
various  module  sets  the  system  can  perform  design 
relative  to  technology  thus  speeding  up  the  incor- 
poration of  new  technology  into  the  design  process. 
Also,  such  a system  will  allow  experimentation  with 
multiple  module  sets,  each  tailored  to  a specific 
class  of  problems.  The  system  would  also  facilitate 
the  design  of  the  module  sets  themselves.  Since  the 
system  operates  on  a symbolic  description  of  the 
modules,  a non-existing  module  set  can  be  fed  to  the 
system  for  experimentation  purposes.  Such  experi- 
ments will  point  out  the  advantages  and  disadvan- 
tages of  the  proposed  module  set. 

At  this  point  it  would  be  instructive  to  describe 
the  order  in  which  the  DA  programs  are  typically 
used  in  the  design  process.  This  will  serve  to  place 
subsequent  discussions  in  perspective.  Given  a 
computational  task,  there  are  usually  several  algori- 
thms that  can  be  employed.  The  algorithm  that  is 
selected  by  the  designer  is  described  to  the  design 
automation  system  (Figure  3)  and  placed  in  a data 
base.  Subsequently  all  design  automation  pro  ams 
will  use  this  data  base.  A high  level  simulator  can 
execute  from  the  data  base  to  facilitate  user  de- 
bugging of  the  initial  description. 

Next  some  evaluation  and  reshaping  of  the 
algorithm  is  undertaken.  Analysis  tools  have  been 
developed  to  check  the  algorithm  for  well-formness 
(e.g.,  deadlock  conditions,  etc.)  (Huen  [75)).  Pertur- 
bations of  the  basic  algorithm  can  also  be  attempted 
such  as:  series-to-parallel  transformations,  replac- 
ing loop  counters  by  wired-in  control,  and  using 
table  look-up  in  lieu  of  computing  the  value  of  func- 
tions. Thus  attempts  are  made  to  first  bind  those 
design  decisions  with  global  implications.  While 
these  perturbations  can  be  performed  independent 
of  the  physical  design,  the  evaluation  of  their 
ultimate  desirability  may  depend  upon  the  module 
set  used  to  Implement  (he  final,  physical  design. 

Finally,  the  actual  physical  design  is  performed 
in  terms  of  RT  level  modules.  The  module  set  can 
be  selected  from  a library  of  module  sets  or  a user 
described  set.  At  this  level  several  forms  of  alloca- 


tion variations  are  encountered; 

• Registers.  Determine  the  allocation  of  the  abstract 
variables  to  registers  and  memory. 

• Data  operators.  Determine  the  number  of  opera- 
tors of  each  type  in  the  design. 

• Control.  Select  control  schema  from  among  unary 
state  encoding,  binary  state  encoding,  micro- 
program control,  etc. 

• Bus-Link  clustering.  Many  RT  designs  start  with  a 
set  of  registers  for  variables  and  interconnect 
them  with  links  to  operators  (add,  shift,  multiply, 
etc.).  After  a point  the  interconnections  between 
certain  registers  and  operators  become  numerous 
enough  to  warrant  replacement  by  a bus. 

• Operator  interconnection.  The  interconnection  of 
operators  has  been  shown  to  have  a significant 
effect  on  the  test  generation  effort  required  for  the 
physical  implementation  (Stephenson  [74]). 

The  signal  level  design  verifier  can  be  used  to 
analyze  the  intermodule  signal  relationships  in  pro- 
posed module  sets.  Even  well-established  module 
sets  have  exhibited  deadlock  behavior  in  what 
appear  to  be  straightforward  interconnections  (Huen 
(751). 

A first  version  of  the  above  system  has  been  im  ■ 
plemented  at  Carnegie  Mellon  and  is  shown  in 
Figure  4.  The  behavioral  specifications  of  the  system 
to  be  designed  are  provided  in  terms  of  the  ISP 
language  (Bell  (71),  Barbacci  [75]).  The  compiler 
produces  an  "object"’  program  which  is  then  loaded 
into  the  data  base  and  manipulated  by  different 
design  programs. 

The  next  five  sections  will  treat  the  applications 
programs  in  detail.  Section  III  described  EXPL,  a 
module  independent  design  program  that  examines 
series-parallel  variations  in  the  original  algorithm. 
The  following  section  presents  the  physical  alloca- 
tors for  two  existing  RT  module  sets— RTMs  and 
Macromodules.  Section  V discusses  the  heuristics 
used  by  EXPL  to  explore  the  design  space.  Sample 


1 The  compiler  produces  an  "object"  program  in 
terms  of  a set  of  Register  Transfer  level  primitive 
operations.  This  program  appears  in  the  form  of  an 
executable  BLISS  (Wulf  [71  j)  program  where  each 
Register  Transfer  operation  is  represented  by  a call 
to  a user-provided  subroutine.  By  changing  the  set 
of  subroutines,  the  compiler  can  support  many 
diverse  activities.  The  creation  of  the  data  base  is,  in 
fact,  done  by  a specific  set  of  subroutines.  The 
compiler  and  the  w?.nguage  are  therefore  indepen- 
dent of  the  applications.  The  uniform  compiler  output 
and  the  flexibility  of  the  subroutine-call  mechanism 
has  simplified  the  interfacing  to  other  application 
programs. 


design  spaces,  examples  of  the  application  of  the 
heuristics,  and  some  observations  are  presented 
In  section  VI.  Section  VII  concludes  the  discussion 
of  the  existing  system  by  briefly  outlining  the  remain- 
ing applications  programs. 

III.  Automatic  Design  Space  Exploration 

EXPL  (Barbacci  [73])  takes  as  input  the  object 
code  produced  by  the  ISP  compiler,  together  with  a 
set  of  user  given  speed/cost  constraints/tradeoffs. 
The  compiler  output  is  used  to  generate  a graph 
representation  of  the  behavior  of  the  system. 
Subsequently,  various  series-to-parallel  and 
parallel-to-series  transforms  on  the  graph  are 
attempted  to  establish  a new  design.  Several  alter- 
native designs  are  generated  and  passed  to  module 
set  evaluators  which  complete  and  evaluate  the 
42  design  in  terms  of  its  hardware  module  set.  Using 
this  evaluation  and  a set  of  heuristics,  EXPL  decides 
which  solutions  should  be  kept  to  generate  other 
solutions  by  yet  another  application  of  the  graph 
transformations. 

Figure  5 is  the  ISP  description  of  an  8-bit  mul- 
tiplier that  will  be  used  as  a running  example  to  il- 
lustrate various  aspects  of  EXPL.  The  algorithm  Is  a 
variant  of  the  shift-and-add  algorithm.  The  multiplier 
Is  in  the  P register  and  the  multiplicand  is  in  the  MPD 
register  and  is  assumed  to  occupy  the  leftmost  8 bits 
of  the  register.  The  product  will  be  in  the  P register. 
The  partial  products  are  formed  in  the  left  hand  side 
of  the  P register  and  shifted  to  their  appropriated 
position  in  the  final  product.  A counter,  C,  is  used  to 
keep  track  of  the  number  of  times  the  basic  mul- 
tiplication step  has  been  performed.  Additional 
details  about  the  algorithm  can  be  found  in  (Bell 
[72]). 

The  description  begins  with  the  specification  of 
the  label  for  the  program  (MULTIPLIER).  Labels  are 
used  in  ISP  to  identify  activities  so  that  they  can  be 
branched  to.  or  used  as  subroutines. 


MULTIPLIER.- 

(DECLARE  MPD  < 15:0>  ; P < 15:0>  ; C < 15:u; 
ERALCED 
C.  8 NEXT 
L1:=  ( 

(DECODE  P<0>  = > P.  P 'SR0  1; 

P.  (P+MPD)  • SRO  1) 

NEXT  C.  C-1NEXT 
(IF  C NEQ  0 = > LI) 


Figure  5. 

The  ISP  Description  of  the  Multiplier. 


The  program  itself  is  enclosed  in  parenthesis,  and 
consists  of  two  parts.  The  declarations  and  the 
specification  of  the  behavior.  The  former  are 
specified  as  a list  of  individual  component 
declarations  (multiplicand,  multiplier/product,  and 
step  counter),  using  the  reserved  identifiers 
DECLARE  and  ERALCED  as  brackets.  The 
specification  of  the  activities  of  the  system  is  given  as 
a list  of  two  sequential  steps.  The  first  step  (C<  8)  in- 
itializes the  counter  and  the  second  is  given  by  a 
labeled  (L1)  block  of  activities.  These  consist  of  a 
sequence  of  three  steps.  The  first  one  performs  the 
basic  multiplication  operation;  the  second  step 
decrements  the  counter;  the  third  step  tests  the 
counter  to  see  if  the  operation  has  been  completed. 
If  the  value  of  the  counter  has  not  reached  0 then  a 
jump  to  the  label  is  indicated  by  using  the  label  as  an 
activity.  If  the  counter  Is  0 then  control  flows  out  of 
the  labeled  statement  and  reaches  the  end  of  the 
program. 

The  basic  multiplication  operation  is  described 
using  the  DECODE  control  operation.  It  implements 
an  n-way  branch  depending  on  the  value  of  the 
expression  following  the  operator.  The  alternative 
paths  selected  by  this  operation  are  given  as  a list 
using  the as  delimiter.  The  first  path  (P-  P ' SRO 
1)  is  selected  if  the  value  of  the  controlling  expres- 
sion (P  C 0 > ) is  0;  the  second  path  (P.  (P  + MPD) 

■ SRO  1)  is  selected  if  the  value  is  1.  The  operator 

■ SRO  repiesents  a shift  right  inserting  zeroes.  The 
number  of  shifted  positions  is  given  by  the  second 
operand  (in  this  case  the  integer  1). 

Figure  6 shows  the  graph  representation  of  the 
ISP  description.  The  mapping  from  the  ISP  descrip- 
tion to  the  graph  form  is  apparent  from  the  example. 
The  system  graph  contains  a unique  entry  point  (the 
START  operation)  and  a unique  exit  point  (the  STOP 
operation).  In  addition  to  these  two  operations,  there 
can  be  five  other  types  of  operations  in  the  graph 
model: 

• branch,  activates  one  of  the  output  paths  depend- 
ing upon  the  value  of  some  operand. 

• serial-merge,  activates  its  output  path  when  any  of 
its  input  signals  arrive. 

• diverge,  activates  concurrently  all  of  Its  output 
paths. 

• parallel-merge,  activates  its  output  path  when  all 
of  Its  input  signals  arrive. 

• data-operatlon  (other). 

Examination  of  the  graph  for  the  multiplier 
example  indicates  several  possible  alternative  de- 
signs. For  instance,  the  computation  of  the  loop 
count  (C-  C-1)  does  not  depend  on  the  shifting  and 
adding  steps  (P^  P ‘ SRO  1 and  P^  (P  + MPD) 

■ SRO  1);  the  two  sets  of  operations  do  not  have 
variables  In  common.  Thus  the  decrement  of  the 
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loop  counter  can  be  performed  in  parallel  with  the 
basic  multiplication  step,  as  shown  in  Figure  7. 

The  graph  thus  obtained  shows  that  the  testing  of 
the  loop  counter,  although  independent  of  the  multi- 
plication steps,  can  not  be  performed  in  parailel  with 
the  decrement  of  the  loop  count.  This  fact  rules  out  a 
transformation  similar  to  the  one  used  previously. 
However,  it  is  possible  to  insert  the  testing  step  in  the 
same  path  as  the  decrement.  This  preserves  the 
required  ordering  of  the  counter  operations  but  now 
the  testing  is  done  concurrently  with  the  multiplica- 
tion step  as  shown  in  Figure  8. 

The  last  graph  represents  an  "optimal"  speed  im- 
plementation. We  are  not  considering,  at  this  point, 
specific  module-set-dependent  optimizations.  For 
instance,  register  allocation  in  the  RTM  data 
operators.  This  type  of  optimization  is  left  up  to  the 
individual  technology-dependent  evaluators. 

It  should  also  be  noted  that,  in  the  example  above, 
although  it  took  two  steps  to  arrive  at  the  final  design 
we  could  have  taken  the  two  counter  operations 
(decrement  and  testing)  as  one  group,  and  place  the 
group  in  parallel  with  the  basic  multiplication  step.  In 
other  words,  we  can  achieve  the  same  final  result  in 
one  step  by  varying  the  size  of  the  graph  partitions. 

These  graph  transformations  have  taken  us  from 
an  original  solution,  to  two  additional  design  alter- 
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Figur*  9 

Th«  Multiplier  Design  Spece  (Mecromodules) 


Figure  10 

The  MuHiplier  Design  Space  (RTM) 


natives.  Both  represent  an  improvement  in  speed 
with  respect  to  the  initial  design.  The  design  space, 
explored  automatically  by  EXPL,  can  be  represented 
as  a two  dimensional  plot.  Each  design  is 
represented  by  its  cost  and  time  coordinates,  com- 
puted by  the  technology-dependent  design 
evaluators. 

The  multiplier  example  was  implemented  using 
both  RTM  s and  macromodules  and  the  design 
spaces  are  shown  in  Figures  9 and  10.  In  both  plots, 
point  0 represents  the  initial  solution  and  points  1 
and  2 represent  the  alternative  designs.  Arrows  in- 
dicate the  steps  taken  in  the  derivation  of  the  alter- 
natives. 

The  position  of  the  alternative  design  in  the  design 
space  is  dictated  by  the  evaluation  of  its  cost  and 
speed.  These  two  parameters  are  measured  in  terms 
of  a specific  technology,  using  specialized  evaluator 
routines,  as  described  in  the  following  section. 


IV.  Module  Set  Evaluators 

Given  a candidate  design,  whether  it  is  the  initial 
graph  or  one  obtained  by  a transformation,  its  cost 
and  speed  must  be  measured  to  ascertain  its  relative 
position  in  the  design  space.  This  position  indicates 
the  quality  of  a solution.  The  evaluation  process  is 
clearly  dependent  on  the  module  set  used  and  is 
currently  performed  by  ad-hoc  routines,  indepen- 
dent of  the  graph  transformation  algorithms.  This 
section  describes  the  evaluation  procedure  used  for 
RTM  (Bell  |72))  and  macromodule  (Clark  (67J) 
systems. 


The  candidate  design  is  represented  by  a descrip- 
tion of  its  behavior,  i.e.,  a graph.  In  this  representa- 
tion the  control  and  data  operations  are  not  bound  to 
specific  physical  components.  The  evaluation  is  per- 
formed by  applying  a series  of  binding  algorithms 
that  map  this  (abstract)  behavioral  description  into  a 
physical  description.  This  is  the  representation 
which  is  then  evaluated.  It  should  be  pointed  out  that 


the  design  space  we  are  dealing  with  is  really  an 
evaluation  space,  not  a structural  space  In  other 
words,  for  each  point  in  the  evaluation  space  there 
may  be  more  than  one  structure. 

The  order  in  which  the  binding  process  is  per- 
formed is  dictated  by  the  interconnection  rules  of  the 
module  set  Specifically,  in  the  RTM  set  all  data 
operators  and  operands  are  connected  to  a bus, 
Figure  11b,  which  not  only  takes  part  in  all  data 
operations  but  also  has  some  control  function.  This 
feature  of  RTM  requires  that  the  binding  of  the  buses 
precede  the  binding  of  the  data  modules  Moreover, 
in  RTM,  concurrency  of  operations  is  dictated  by  the 
number  of  available  buses.  Thus  the  selection  of  the 
buses  must  follow  the  binding  of  the  control 


operators.  The  order  of  bindings  in  RTM  is  the 
following: 

(1)  Selection  and  identification  of  control  modules. 

(2)  Selection  and  identification  of  the  buses  required 
to  implement  the  control  flow  structure. 

(3)  Selection  and  identification  of  the  data  modules 
that  are  to  be  attached  to  each  individual  bus. 
Macromodules  represent  a different  design 

philosophy.  Modules  are  connected  directly  and 
there  is  no  need  for  buses  as  in  the  RTM  set.  The 
basic  feature  of  the  modules  is  the  register  stack. 
The  register  or  memory  module  occupies  the  base  of 
the  stack  and  takes  part  in  the  data  operations 
plugged  in  above  it,  Figure  11a.  Any  data  operation 
whose  direct  output  is  to  be  stored  in  register  X must 
be  physically  located  above  register  X and  cannot  be 
shared  with  other  registers.  Thus,  the  data  operands 
dictate,  to  a large  degree,  the  structure  of  the 
system.  Special  cables  are  used  to  interconnect 
stacks.  Control  signals  are  generated  upon  the  com- 
pletion of  a data  operation  in  order  to  Initiate  the  next 
operation(s).  The  order  of  binding  is  dictated  by 
these  interconnection  rules: 

(1)  Selection  and  Identification  of  data  modules. 

(2)  Selection  and  identification  of  the  data  operators 
that  are  to  be  attached  to  each  data  operand. 

(3)  Selection  and  identification  of  the  control 
modules. 

In  both  RTM's  and  macromodules  the  binding  of 
control  operations  to  control  modules  is  usually 
straightforward.  There  is  a one-to-one  cor- 
respondence between  the  abstract  control 
operations  and  the  control  modules. 

The  binding  of  the  data  operations  is  more  dif- 
ficult. In  general  there  is  not  a one-to-one  mapping 
from  the  operations  in  the  graph  to  the  operators  in 
the  module  set.  For  instance,  in  RTM  the  data 
operations  are  performed  in  a general  purpose 
arithmetic  unit  (the  DMgpa  module).  Mapping  from 
the  abstract  operations  in  the  graph  Into  the  RTM 
structure  requires  the  use  of  specialized  templates. 
These  templates  act  like  macros  in  an  assembly 
language.  For  each  abstract  data  operation  in  the 
graph  there  is  a template  that  indicates  the  number 
and  order  of  RTM  operations  that  must  be  executed 
in  order  to  achieve  the  desirdd  effect.  For  instance, 
an  operation  like  A • F t C is  mapped  Into  a 
sequence  of  RTM  primitive  operations  like: 
DMgpa/A  - B;  DMgpa/6  •-  C;  A • DMgpa/A  + 
DMgpa/ B. 

Templates  of  a similar  nature  are  used  by  the 
macromodule  evaluator,  although  in  this  case  the 
nature  of  the  module  set  allows  more  flexibility.  For 
instance,  using  the  example  above,  there  are  several 
ways  of  Implementing  the  statement  and  the  choice 
can  be  critical.  Since  there  are  no  general  purpose 


arithmetic  units  (the  operator  modules  are 
specialized)  the  statement  A •-  B + C can  be  im- 
plemented alternatively  as:  A — B;  A ■ A -i-  C or  A ' 

C;  A - A + B. 

The  decision  is  critical  and  depends  upon  the  data 
operators  already  connected  to  a given  register  (in 
this  case,  register  A)  or  which  will  be  connected  in 
order  to  implement  subsequent  operations.  If  the 
operator  A •-  A + C has  already  been  placed  in  the 
stack  of  register  A,  the  first  option  is  clearly  the  one  to 
adopt.  On  the  other  hand,  if  none  of  the  operators 
exists  then  the  template  adopted  depends  upon  the 
future  uses  of  the  operator,  a decision  based  on  a 
global  analysis  of  the  graph. 

From  the  previous  discussion,  it  is  clear  that  in 
RTM's  the  critical  choices  are  associated  with  the 
binding  of  the  data  operands  to  memory  modules. 

For  Instance,  we  can  allocate  a variable  to  one  of  the  45 
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two  registers  of  an  arithmetic  unit.  There  are  certain 
trade-offs  that  can  be  achieved  between  the  time 
taken  to  move  variables  in  and  out  of  these  special 
registers  versus  the  cost  of  adding  extra  arithmetic 
units  in  order  to  use  their  registers.  In 
macromodules  the  critical  choices  are  associated 
with  the  binding  of  data  operations  to  data 
operators.  For  instance,  we  can  opt  to  implement  all 
data  operators  in  terms  of  an  auxilliary  register,  used 
as  an  accumulator.  There  are  trade-offs  between  the 
time  needed  to  route  the  data  to  and  from  the  ac- 
cumulator versus  the  cost  of  having  more 
specialized  data  operators,  associated  with  the  in- 
dividual registers. 

An  important  distinction  between  macromodules 
and  RTM  is  the  degree  and  flexibility  with  which  con- 
currency of  operations  can  be  implemented.  As  we 
46  mentioned  before,  RTM  operands  (registers, 
memories,  etc.)  are  physically  connected  to  a single 
bus.  This  implies  that  variables  can  not  be  readily 
shared  by  concurrent  computations.  Common 
variables  must  be  copied  and  allocated 
separately— a process  which  degrades  both  the  cost 
and  speed  of  the  design.  Macromodules  on  the  other 
hand  allow  almost  unlimited  concurrency;  variables 
are  accessible  directly  from  any  part  of  the  system 
and  there  is  no  need  to  allocate  extra  copies.  These 
properties  of  the  module  sets  imply  that,  while  in 
macromodules  the  intuitive  feeling  that  parallelism 
implies  extra  cost  and  extra  speed  holds  true,  in 
RTM's  the  need  to  allocate  and  transfer  variables 
between  the  buses  may  so  degrade  the  performance 
that  for  certain  systems  more  concurrency  implies 
more  cost  and  less  speed. 

V.  Heuristics  and  Design  Space  Trade-Offs 

Due  to  the  interaction  between  series/parallel 
transformations  in  EXPL  it  is  a difficult  task  to  for- 
malize the  optimizatioh  (improvement  of  alternative 
structures)  as  a mathematical  optimization  problem. 
The  main  difficulty  is  the  fact  that  transformations 
apply  to  subgraphs  of  arbitrary  size  and,  as  a con- 
sequence transformations  in  a given  alternative 
structure  may  or  may  not  be  feasible  or  desirable  in 
structures  derived  from  it.  It  is  also  the  case  that  new 
cases  of  transformations  become  feasible  or 
desirable  only  after  a specific  sequence  of  transfor- 
mations has  been  applied. 

Two  parameters  will  be  used  to  describe  the 
design  space  The  cost  of  the  hardware  involved  and 
the  operational  time.  The  former  is  obtained  by  ad- 
ding the  costs  of  the  components  used  in  both  the 
data  and  control  structures.  The  latter  is  obtained 
from  the  average  speed  of  the  operations  involved. 

For  a straight  sequence  of  operations  the  time 
required  is  the  sum  of  the  ihdividual  times.  Figure 


12. a.  In  the  presence  of  concurrent  activities,  the 
operation  time  is  that  of  the  longest  (timewise) 
sequence.  Figure  12. b. 

When  computing  the  times  required  by  the  alter- 
native paths  of  a branch  operation  EXPL  assumes, 
by  default,  that  all  such  paths  have  equal 
probabilities  of  being  executed  (the  probability 
being  1/n  for  n-way  branches).  This  default  can  be 
overruled  by  the  user  by  specifyihg  the  branching 
probabilities  for  the  individual  paths.  The  computa- 
tion of  the  times  required  by  the  paths  is  then 
weighted  by  the  branching  probability  associated 
with  the  path.  Figure  12  c.  The  execution  time  is  then 
the  sum  of  these  weighted  path  times. 

The  presence  of  cycles  (loops)  adds  some  com- 
plexity to  the  estimation  of  the  operation  time.  In  this 
case  the  level  of  nesting  is  assumed  to  be  propor- 
tional to  the  frequency  of  execution  of  the 
operations.  Conceptually  this  is  equivalent  to 
replacing  the  cycit  by  a sequence  of  multiple  copies 
of  the  individual  operations.  Since  the  number  of 
times  a loop  is  executed  (i.e.,  the  number  of  copies) 
is  usually  unknown,  a default  (2)  is  assumed  (this  is  a 
consequence  of  the  default  50%  probability  of 
branching  back  to  the  loop  head).  This  default  may 
be  overruled  by  the  designer  by  specifying  an  es- 
timate loop  count  or,  alternatively,  simply  the 
branching  probabilities  if  the  loop  count  is  not 
known.  Figure  12, d. 

Having  defined  the  parameters  of  the  desigh 
space  we  can  now  describe  the  trade-offs  involved  in 
the  transformation  rules.  Connectivity  and  data 
dependency  are  used  in  the  system  to  ihdicate  the 
feasibility  of  a transformation.  Feasible  transfor- 
mations, however,  do  not  imply  necessarily  any  ad- 
vantage in  their  application  and  the  desirability  of 
such  a trar  sformation  is  indicated  by  a different  set 
of  conditions. 

The  exploration  of  the  design  space  in  our  system 
is  performed  by  a group  of  heuristic  routines  that 
produce  alternative  designs  in  a goal  oriented 
fashion;  the  goal  being  specified  by  the  desigher. 
Ideally,  the  goal  is  to  find  an  alternative  structure 
whose  position  in  the  design  space  is  as  close  as 
possible  to  the  origin  (0  cost  and  0 time).  This  idea 
case  is,  however,  not  easily  found  in  real  solutions. 
The  usual  case  is  that  the  least  expensive  solution  is 
not  the  fastest  and  vice  versa.  This  characteristic 
provides  a rough  classification  of  the  design  objec- 
tives into  two  classes,  minimal  cost  and  minimal 
time. 

Although  a designer's  aim  can  be  classified  accord- 
ing to  these  objective  functions,  it  may  be  the  case 
that  the  real  objective  is  more  complicated  in  nature, 
namely,  some  combination  of  time  and  cost.  For  in- 
stance, the  objective  could  be  something  like:  "the 


fastest  alternative  structure  not  costing  more  than  x 
dollars." 

For  simplicity,  the  subspace  of  acceptable 
solutions  will  be  defined  by  a set  of  straight-line 
segments  whose  slopes  reflect  the  objective  func- 
tions. In  the  example  above  a single  straight  line, 
parallel  to  the  cost  axis,  would  be  used  to  divide  the 
space  in  two  halves.  Only  those  solutions  that  lie  in 
the  semispace  containing  the  origin  are  considered 
acceptable.  These  solutions  represent  im- 
provements along  the  design  goal. 

More  complex  constraints  can  be  described  by 
using  lines  of  the  form  C— -m^T  + b,  where  m is  a 
parameter  indicating  how  many  dollars  the  designer 
is  willing  to  pay  for  each  time  unit  saved  (if  time  is  the 
primary  goal)  or  how  m iny  time  units  the  designer  is 
wiiling  to  sacrifice  for  et  ch  dollar  saved  (if  cost  is  the 
objective).  An  example  Figure  13,  will  clarify  this 
description. 

Assume  that  the  primaiy  objective  is  a reduction 
in  time  and  that  the  designer  wants  a time/cost 
trade-off  of  at  most  m dollars  for  each  time  unit  im- 
provement. Furthermore,  assume  that  the  original 
design  is  characterized  by  Cl  and  T1.  The  "accept- 
able trade-off"  subspace  would  thus  be  delineated 
by  two  line  segments;  one  parallel  to  the  cost  axis 
starting  from  (T1,C1)  to  (T1,0),  and  the  other  through 
(T1,C1)  with  slope  -m.  By  studying  the  control  flow 
and  data  dependencies  in  this  origiial  structure, 
four  transformations  are  available  which  yield  four 
alternative  solutions  derived  from  the  original  one 
A,B,C,D. 

By  dividing  the  space  according  to  the  trade-off 
lines  alternatives  B,  C,  and  D can  be  rejected 
because  their  characteristics  are  not  within  the 
acceptable  subspace  (i.e.,  they  take  more  time  or  the 
decrease  in  time  costs  too  much).  The  alternative 
left.  A,  represents  improvement  In  time  while  the 
cost  to  achieve  the  improvement  is  under  the 
designer’s  threshold. 

The  process  can  now  be  applied  to  A in  an  iden- 
tical manner.  Design  A is  taken  as  the  new  initial 
solution  and  a new  "acceptable  trade-off"  subspace 
is  defined  by  a line  segment  (T2,C2)  to  (T2,0)  and  a 
line  with  slope  -m  through  (T2,C2).  Since  in  some 
cases  more  than  one  alternative  can  be  left  for 
further  exploration,  this  process  takes  the  form  of  a 
tree  walk  where  the  nodes  represent  alternative 
solutions  and  the  edges  are  the  transformations 
applied.  In  some  instances,  identical  structures  can 
be  obtained  by  different  sequences  of  transfor- 
mations and  the  exploration  of  the  design  space  is  a 
praph-walking  orocess.  In  any  event,  a path  ends 
when  no  alternative  solutions  worth  exploring  can  be 
reacned  from  a given  point.  When  all  possible  paths 
have  been  explored  the  end  nodes  are  measured 


against  the  primary  objective  and  the  best  one 
chosen. 

In  general,  the  space  of  alternative  solutions  looks 
more  like  a graph  than  a tree.  Several  paths  (i.e., 
sequences  of  transformations)  may  lead  to  the  same 
solution.  Thus,  it  is  important  to  detect  points  in  the 
space  that  have  already  been  examined.  Other 
problems  that  arise  in  the  exploration  process  have 
to  do  with  the  cost  of  the  process  itself.  EXPL  does 
not  perform  a brute  force  search.  Accepting  an  alter- 
native solution  for  further  exploration  depends  upon 
the  goals  indicated  by  the  user.  Besides  the  main 
goals  (speed,  cost,  and  a trade/off  factor)  mentioned 
before,  the  user  ^an  also  specify  a minimum  percent- 
age gain  for  a transformation-derived  solution  to  be 
acceptable.  If  the  gain  falls  below  this  threshold  the 
new  design  is  rejected.  This  pruning  process,  when 
applied  indiscriminately,  can  lead  to  an  incomplete 
exploration.  It  may  be  the  case  that  although,  a 
derived  solution  is  werse  (according  to  the  goals) 
than  its  parent  solution,  solutions  derived  from  the 
former  could  in  fact  be  better  than  the  parent.  EXPL 
handles  the  detection  of  this  type  of  local  optimality 
by  allowing  the  user  to  specify  a rejection  level.  The 
rejection  level  indicates  whether  or  not  non- 
improving solutions  are  to  be  further  explored.  The 
user  specifies  the  maximum  length  of  such  non- 
improving paths. 

The  following  section  briefly  presents  several 
examples  of  design  spaces.  The  examples  illustrate 
some  of  the  points  discussed  previously. 

VI.  Sample  Design  Spaces 

In  this  section  we  will  present  three  examples  of 
the  lesign  spaces  explored  by  EXPI ..  We  will  not  dis- 
cuss the  specific  systems  whose  design  spaces  are 
depicted  in  Figures  14, 15,  and  16.  The  examples  will 
be  used  to  show  the  characteristics  of  the  design 
spaces  and  the  exploration  procedures. 

Figure  14  shows  the  design  space  for  a RTM 
system  that  is  used  as  a controller  for  the  X-  and  Y- 
plates  of  an  oscilloscope.  The  system  is  used  at  CMU 
for  RTM  demonstrations  (the  “Munching  Squares 
Generator”).  The  first  characteristic  that  can  be 
noticed  is  the  stratification  of  the  alternative  designs. 
The  solutions  appear  in  horizontal  bands  represent- 
ing solutions  of  similar  cost.  This  is  due  to  the  high 
cost  of  the  RTM  buses  compared  with  the  cost  of  the 
other  modules  in  the  RTM  set.  The  space  is  divided 
Into  bands  corresponding  to  the  1,  2,  3,  and  4 bus 
solutions. 

The  figure  shows  the  degrading  effect  in  RTMs  of 
sharing  variables  between  concurrent  computations. 
The  best  solution  (in  terms  of  speed)  used  3 buses 
and  is  faster  than  the  4 bus  solutions.  The  algorithm 
is  such  that,  although  it  allows  a high  degree  of  con- 


currency, when  this  degree  exceeds  a certain 
threshold  there  Is  a loss  of  speed  in  the  total  system. 
The  path  followed  to  find  the  best  solution  is  in- 
dicated in  the  figure.  It  is  interesting  to  observe  the 
transition  from  solution  2 to  solution  3.  There  is  a 
substantial  gain  in  speed  together  with  a reduction  in 
cost.  The  explanation  is  that  once  the  cost  of  a bus 
has  been  accepted  as  a reasonable  price  to  pay  for  a 
given  gain  in  speed  it  does  not  cost  much  to  spread 
the  load  and  perform  more  operations  concurrently. 
Indeed,  as  the  example  shows,  alternative  allocation 
of  the  computations  to  the  buses,  for  a fixed  number 
of  buses,  is  crucial. 

F jure  15  depicts  a feature  of  the  search 
procedure  used  in  EXPL.  When  a solution  is 
analyzed  the  set  of  feasible  transformations  that  can 
be  applied  to  its  graph  is  tabulated.  The  improve- 
ment factor  specified  by  the  designer  is  then  used  to 
prune  this  table.  Thi.«  pruning  takes  place  before  a 
transformation  is  applied  and  is  based  on  a 
preliminary  "best  case"  analysis  of  a candidate 
transformation.  The  solution  derived  by  applying  the 
transformation  may  or  may  not  realize  the  potential 
gain  indicated  by  the  preliminary  analysis.  This 
reduction  in  the  predicted  gain  is  due  to  several 
causes.  If  the  goal  is  a reduction  of  cost,  performing 
two  concurrent  operations  in  sequence  may  not  in 
the  case  of  RTMs  result  in  a reduction  in  the  number 
of  buses  (other  computations  may  require  the  bus 
that  was  thought  to  be  expendable).  If  the  goal  is  a 
gain  in  speed,  adding  buses  may  result  in  a loss  of 
speed  due  to  the  time  required  to  copy  and  move 
variables  between  the  buses  in  the  system.  Similar 
considerations  can  be  applied  to  the  case  of 
macromodules. 

Figures  14  and  15  correspond  to  the  design  space 
for  the  same  RTM  system  explored  using  different 
improvement  thresholds.  In  the  space  shown  in 
Figure  15,  the  preliminary  improvement  threshold 
was  set  to  a higher  level  (20%)  than  in  the  space 
shown  in  Figure  14  (10%).  An  interesting 
phenomenon  occurred.  The  transformation  in- 
dicated by  the  directed  line  in  Figure  15  had  a very 
promising  preliminary  evaluation  (over  30% 
predicted  gain).  When  the  transformation  was 
applied,  the  new  solution  did  not  realize  the 
predicted  gain.  It  was,  nevertheless,  better  than  the 
original  solution  and  was  later  chosen  by  the  system 
as  the  best  solution.  All  feasible  (i.e.,  applicable) 
transformations  to  this  new  solution  were  then 
examined  and  none  of  them  promised  to  be  better 
than  the  threshold.  All  of  these  transformations  were 
then  rejected  and  the  exploration  path  was  ter- 
minated. When  the  same  situation  appeared  in  the 
example  of  Figure  14,  there  were  several  transfor- 
mations that  were  better  than  the  new,  lower, 
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Figure  14 

RTM  Design  Space  (MSG  System  with  10%  Improvement  Factor) 


threshold.  One  of  them  led  in  fact  to  the  best  solution 
of  the  space  of  Figure  14.  It  is  interesting  to  observe 
in  Figure  1 4 that  the  slope  of  the  transformation  from 
solution  1 to  solution  2 indicates  a better  cost/speed 
trade-off  than  the  transformation  for  the  original 


solution  -point  0 - to  solution  1.  The  gain  in  speed 
produced  by  the  transformation,  although  smaller 
than  the  threshold  used  in  Figure  15,  was  achieved 
completely;  there  was  no  overhead  added  to  the 
system  by  the  extra  concurrency. 


Figure  15 

RTM  Design  Space  (MSG  System  with  20%  Improvement  Factor) 


Figure  16 

Macromodule  Design  Space  (Conveyor-Bin  System) 


This  type  of  anomalies  is  not  uncommon  in  the 
modular  design  spaces  explored  so  far,  if  anything, 
they  tend  to  be  the  rule  rather  than  the  exception. 
The  piuning  of  the  applicable  transformations, 
based  on  a preliminary  analysis,  can  lead  us  to  ig- 
nore certain  transformation  paths  that  may  yield 
better  solutions.  EXPL  is,  in  this  sense,  not  very 
smart.  Better  heuristics  are  needed  and  research  in 
this  area  is  actively  pursued  by  the  implementors  of 
the  system.  It  is  valid  to  ask,  “why  then  should  the 
system  do  any  pruning  at  all?".  The  only  reason  we 
can  provide  is  based  on  the  analysis  of  the  cases 
studied  so  far.  Applying  a transformation  without  any 
considerations  to  its  possible  gain  is  an  expensive 
proposition.  For  any  solution,  branching  factors  (i.e., 
number  of  feasible  transformations)  of  30  to  50  are 
not  uncommon.  Applying  a transformation  implies  a 
reconfiguration  of  the  graph  and  the  recomputation 
of  several  associated  tables— an  expensive  opera- 
tion in  the  current  implementation.  Applying  each 
feasible  transformation  can  lead  to  a very  expensive 
design  process.  The  system  as  implemented  allows 
the  designer  to  guide  the  exploration  via  an  interac- 
tive command  language.  In  this  interactive  mode, 
EXPL  does  not  perform  any  pruning  and  the 
designer  Is  free  to  order  the  system  to  perform  any 
feasible  transformation,  regardless  of  its  predicted 
gain.  The  automatic  mode  of  exploration  can 
therefore  be  used  selectively  under  user  guidance. 

Figure  16  shows  the  design  space  for  a system 
designed  as  a controller  for  a conveyor-bin  unit.  The 
design  spact.  corresponds  to  the  alternative  designs 
implemented  using  macromodules.  The  figure  is  a 
good  example  of  a design  space  with  multiple  paths 
leading  to  the  same  solution.  The  space  configura- 
tion also  indicates  the  charcteristic  of  macromodular 
systems.  Once  a basic  design  is  implemented, 
variations  in  the  level  of  concurrency  do  not  present 
the  radical  changes  in  cost  typical  of  RTM  system. 
The  basic  costs  of  the  macromodular  system  are 
given  by  the  memory  and  data  operation  modules 
(the  “stacks").  Variations  In  concurrency  only  imply 
adding  or  eliminating  control  modules  and  cables,  a 
minor  fraction  of  the  total  cost. 

VII.  Other  Design  Tools 

Another  application  is  design  verification.  It  is 
possible  to  develop  an  ISP  description  that  is  syntac- 
tically correct  but  that  does  not  make  sense  logically. 
Figure  17.a  depicts  a syntactically  correct  ISP  while 
Figure  17.b  Illustrates  the  corresponding  graph.  The 
graph  is  essentially  the  same  one  produced  by  the 
ISP  compiler.  The  data  operations  have  been 
deleted  as  a notational  conver.ience  (we  can  think  of 
the  data  operations  as  being  assimilated  into  the 
arcs  connecting  the  control  operations). 
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In  the  case  of  x = 1 the  right  half  of  the  parallel 
merge  in  the  graph  wouir  receive  two  control  signals 
(one  from  the  right  half  oi  the  diverge,  the  other  from 
the  left  half  via  the  branch).  The  other  input  to  the 
parallel  merge  would  not  receive  a control  signal  and 
the  system  would  deadlock  at  the  parallel  merge. 
Analytical  tools  based  on  the  vector  addition  system 
(VAS)  (Huen  [75])  have  been  programmed  to  detect 
such  design  flaws. 

The  VAS  is  best  introduced  by  example.  Consider 
Figure  17.b.  The  arcs  in  the  graph  represent  register 
transfers  while  the  vertices  represent  control 
primitives.  Each  arc  may  contain  tokens  repre- 
senting evocation  of  the  associated  register  transfer. 
Graphically  a token  is  represented  by  a dot  on  an 
arc.  A marking  of  a graph  with  r arcs  is  a mapping 
from  the  set  of  r arcs  to  an  r-dimensional  vector  of 
nonnegative  integers,  each  of  which  represents  the 
number  of  tokens  on  the  corresponding  arc. 

A vertex  with  a token  on  its  single  input  arc  is  said 
to  be  enabled.  Only  enabled  vertices  can  fire.  The 
firing  of  a vertex  removes  a token  from  the  Input  arc 
and  deposits  a token  on  its  output  arc.  For  the  case 
of  multiple  input  arcs  there  is  an  associated  logic 
condition,  either  disjunctive  (signified  by  a +)  or 
conjunctive  (+).  A vertex  with  disjunctive  input  arcs  is 
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enabled  when  any  input  arc  has  a token.  Firing  the 
vertex  removes  a token  from  one  input  arc.  This  cor- 
responds to  a serial  merge  in  the  compiler  produced 
graph.  A conjunctive  input  condition  requires  tokens 
on  all  input  arcs  before  the  vertex  is  enabled 
'parallel  merge).  Firing  the  vertex  removes  a token 
from  all  the  input  arcs.  Likewise  a set  of  oM*iut  arcs 
can  be  disjunctive  or  conjunctive.  When  a verti  x with 
disjunctive  output  condition  fires  it  places  a token  on 
one  of  the  output  arcs  (branch).  The  conjunctive 
condition  places  a token  on  all  the  output  arcs 
(diverge).  A simulation  is  a sequence  of  permissible 
vertex  firings. 

Simulations  are  conveniently  represented  by  the 
Vector  Addition  System  (VAS)  (Kuen  [75]).  Figure 
17. c depicts  the  VAS  for  the  graph  in  Figure  17.b. 
The  VAS  consists  of  an  initial  marking  vector  Mo  and 
a set  of  displacement  vectors  which  correspond  to 
vertices.  Each  component  of  the  vector  corresponds 
to  an  arc.  All  valid  firings  (new  markings)  of  the 
graph  can  be  determined  by  adding  a displacement 
vector  to  the  current  marking  Mi.  Those  additions 
which  result  in  all  marking  vector  components  being 
nonnegative  are  valid  markings  and  can  be  used  to 
establish  subsequent  valid  markings.  For  example, 
the  only  valid  marking  from  the  initial  marking  Mo 
resulting  from  the  addition  of  a single  displacement 
vector  (e  g.,  D1)  in  Figure  17. c is  (0,1, 1,0, 0,0,0).  The 
displacement  vector  D2  does  not  lead  to  a valid 
marking  since  the  result  of  its  addition  to  Mo  is 
(1,0,0, 1,0,0,-1). 

A control  flow  tree  depicting  all  possible  markings 
(or  states)  of  the  VAS  can  be  constructed.  A portion 
of  that  tree  for  our  example  is  shown  in  Figure  18. 
Nodes  are  appended  to  the  tree  unit,  for  each  leaf, 
either  its  marking  is  identical  to  that  of  one  of  its 
ancestors  or  no  displacement  vectors  can  be 
applied.  In  either  ca:  the  node  is  called  a leaf. 

Properties  of  this  tree  can  be  used  to  detect 
properties  of  the  graph.  For  example,  the  leaf  (0,0,0,- 
0,1, 0,0)  represents  a properly  terminating  sequence 
since  there  is  a single  token  on  arc  4.  By  contrast, 
leaf  (0,0,0,2,0,0,0)  represents  two  .okens  on  arc  3. 
No  tokens  are  on  the  exit  arc.  This  is  the  deadlock 
situation  alluded  to  earlier.  Furthermore,  depending 
on  the  actual  physical  implementation  of  the  graph, 
this  leaf  may  indicate  a lost  signal. 

Another  obvious  application  is  a simulator.  The 
subroutine  calls  produced  by  the  ISP  compiler  make 
the  generation  of  a simulator  particularly  easy.  Data 
subroutines  update  the  data  structures  and  control 
subroutines  direct  the  flow  of  the  simulation.  A com- 
mand language  allows  the  user  to  direct  the  simula- 
tion and  examine  the  state  of  various  data  struc- 
tures. It  is  also  desirable  to  produce  designs  accord- 
ing to  criteria  other  than  the  traditional  cost/speed 


criteria.  One  such  criterion  is  testability.  The  struc- 
ture of  the  final  design  substantially  determines  the 
ease  with  which  tests  can  be  generated  for  the 
design.  A testability  measure  (Stephenson  [74])  has 
been  developed  that  correlates  well  with  actual  test 
generation  effort.  It  is  important  to  note  that  the  com- 
mon representation  used  as  input  to  the  various 
design  programs  is  a critical  feature  that  insures  that 
the  algorithm  being  evaluated  is  actually  the  orie 
being  implemented,  verified,  or  simulated  by  the 
other  design  programs. 
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Figure  18 

Portion  of  the  Control  Flow  Tree  for  the  Example 
of  Figure  17 
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Figure  19  — SMCD 

The  Symbolic  Manipulation  of  Computer  Descrip- 
tions 


VIII.  Future  Directions 

To  achieve  the  goal  of  automatic  design  relative  to 
technology  a mechanism  is  required  that  would  take 
the  description  of  a module  set  and  create  the 
equivalent  of  the  ad  hoc  module  set  evaluators 
currently  in  use.  It  was  also  noted  in  section  IV  that 
the  order  of  physical  allocation  (registers,  buses, 
operators,  etc.)  is  a strong  function  of  the  design 
style  imposed  by  the  module  set.  This  information 
would  also  have  to  be  extracted  from  the  module  set 
description. 

The  preliminary  design  automation  system  and  a 
machine  relative  optimizing  compiler-comp  ler 
project  serve  as  a stepping  stone  to  an  even  more 
ambitious  project  termed  the  Symbolic  Manipulation 
of  Computer  Descriptions  (SMCD)  (Barbacci  (74]), 
depicted  in  Figure  19.  There  is  a continual  stream  of 
new  machines  spurred  by  the  advent  of  minicom- 
puters and  microprocessors.  Each  machine  has  a 
different  instruction  set.  The  emergence  of 
microcoded  systems  with  the  option  of  user-defined 
instruction  sets  has  increased  this  flow  of  instruction 
sets.  Each  new  system  requires  supporting  software 
and  the  amount  of  software  grows  for  any  individual 
system  as  user  requirements  grow. 

One  direction  in  which  to  seek  a solution  to  ease 
the  burden  of  software  development  Is  to  relativize 
the  production  of  software  to  the  description  of  the 
machine.  The  central  lngredi->rt  of  this  approach  is 
the  description  of  computei  'stems  in  a symbolic 
form,  such  that  a range  of  proolems  can  be  solved 
by  manipulation  of  these  descriptions. 

Figure  19  depicts  the  scope  of  the  SMCD  project. 
The  ultimate  goal  would  be  to  produce  and  evaluate  a 
computer  system  from  its  behavioral  specifications, 
together  with  the  documentation  and  system 
programs.  Thus  the  delay  from  the  conception  of  a 
new  architecture  to  the  time  it  is  implemented  and 
ready  for  users  can  be  significantly  reduced 
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M.S.,  Stanford  University  (1969) 
Carnegie,  1974;  Artificial  Intelligence 


A.  Nico  Habermann 

Professor  of  Computer  Science 
B.S.,  Kree  University,  Amsterdam  (1953) 

M.S.,  Free  University,  Amsterdam  (1957) 
Ph.D.,  Technological  University, 

Eindhoven,  The  Netherlands  (1967) 
Carnegie,  1968:  Operating  Systems  and 
Programming  Languages 

Louis  Hageman 
Senior  Lecturer 

B.A.,  DePauw  University  (1955) 

B.S.,  Rose-Hulman  Institute  of 
Technology  (1955) 

M.S.,  University  of  Pittsburgh  (1959) 

Ph.D.,  University  of  Pittsburgh  (1962) 
Carnegie,  1973:  Numerical  Analysis 

Frederick  Hayes-Roth 
Research  Associate 
B.A.,  Harvard  University  (1969) 

M.S.,  University  of  Michigan  (1972) 

Ph.D.,  University  of  Michigan  (1974) 

Carnegie,  1974:  Artificial  Intelligence, 

Pattern  Recognition,  Cognitive  Psychology 

Laurent  Hyafil 
Visiting  Researcher 
M.S.,  Ecole  Polytechnique  (1972) 

Ph.D.,  Universite  de  Paris  (1974) 

Carnegie,  1975:  Computational  Complexity 

Anita  K.  Jones 

Assistant  Professor  of  Computer  Science 
B.A.,  Rice  University  (1964) 

M.A.,  University  of  Texas  (1966) 

Ph.D.,  Carnegie-Mellon  University  (1973) 
Carnegie,  1968:  Programmed  Systems 

Boieslaw  Kacewicz 
Visiting  Researcher 
M S.,  University  of  Warsaw  (1974) 

Carnegie,  1975:  Computational  Complexity 
and  Numerical  Mathematics 

Masahiko  Kida 
Visiting  Scholar 

B.S.,  Waseda  University  (1968) 

Carnegie,  1974:  Multiprocessor  Systems 


H.  T.  Kung 

Assistant  Professor  of  Computer  Science 
O.S.,  National  Tsing  Hua  University, 

Taiwan  (1968) 

Ph.D,,  Carnegie-Mellon  University  (1973) 
Carnegie,  1973:  Computational  Complexity, 
Parallel  Compulation,  Numerical  Mathematics 

Victor  R.  Lesser 

Research  Computer  Scientist 

A. B.,  Cornell  University  (1966) 

M.S.,  Stanford  University  (1970) 

Ph.D.,  Stanford  University  (1972) 

Carnegie,  1972:  Parallel  Systen 

Organization  for  Artificial  In.elligence 
(e.g..  Speech  Undersfandinf),  Computer 
Architecture  (Micro-programming, 
Multiprocessor  systems).  Operating 
Systems  and  Problem  Decomposition  for 
Multiprocessors 

John  W.  McCredie 

Lecturer  in  Computer  Science 
Head  of  Computation  Center 
B E.,  Yule  University  (1962) 

M.S.E.E.,  Yale  University  (1964) 

Ph.D.,  Carnegie-Mellon  University  (1972) 
Carnegie,  1968:  Analytical  Modeling, 

Simulation,  and  System  Performance 
Evaluation 

Robert  Meersman 
Visiting  Researcher 

Ph  D.,  Vrije  Universiteit  Brussel  (1975) 

Carnegie,  1975:  Parallel  Computation 

John  McDermott 

Visiting  Research  Associate 

B. A.,  St.  Louis  University  (1966) 

M.A.,  St.  Louis  University  (1967) 

Ph  D.,  University  of  Notre  Dame  (1969) 
Carnegie,  1974:  Artificial  Intelligence, 

Production  Systems 

James  Moore 

Research  Associa*e 
B.S.,  Massachusetts  Institute  of 
Technology  (1964) 

Ph.D.,  Carnegie-Mellon  University  (1971) 
Carnegie,  1971:  Artificial  Intelligence  and 
Semantic  Nets 


Ian  Munro 
Visiting  Researcher 

B.A,,  University  of  New  Brunswick  (1968) 
M.Sc.,  University  of  British  Columbia  (1969) 
Ph.D.,  University  of  Toronto  (1971) 

Carnegie,  1975:  Computational  Complexity 

Joseph  Newcomer 
Research  Associate 
B.A.,  St.  Vincent  College  (1967) 

Ph.D.,  Carnegle-Mellon  University  (1975) 
Carnegie,  1975:  Operating  Systems, 
Programming  Languages 

Allen  Newell 
University  Professor 
B.S.,  Stanford  University  (1949) 

Ph.D.,  Carnegie  Institute  of  Technology 
(1957) 

Carnegie,  1961;  Artificial  Intelligence, 
Psychology  of  Human  Thinking, 
Programming  Systems,  and  Computer 
Structures 


naj  Heady 

Professor  of  Computer  Science 
B.E  , University  of  Madras  (1958) 

M.  Tech.,  University  of  New  South  Wales  (1 
M S.,  Stanford  University  (1964) 

Ph.D.,  Stanford  University  (1966) 

Carnegie,  1969:  Artificial  Intelligence, 
Computer  Graphics,  and  Man-Machine 
Communications 


Mario  Schkolnick 


Assistant  Professor  of  Computer  Science 
Electrical  Engineer.  Univeisity  of  Chile 
(1965) 

M.S.,  University  of  California  (1967) 
^*'(1969^*''^'^'*^  o'  California  at  Berkeley 


Carnegie,  1973:  Data  Base  Design, 
Complexity  Theory 


Daniel  Serain 
Visiting  Scholar 

M S.,  L Universite  de  Grenoble  (1972) 
Carnegie,  1974.  Multiprocessor 
Structure,  Operating  Systems 


Mary  Shaw 

Assistant  Professor  of  Computer  Science 
B.A.,  Rice  University  (1965) 

Ph  D , Carnegie-Mellon  University  (1972) 
Carnegie,  1971:  Programming  Systems, 
Software  Tools,  the  Programming 
Environment,  and  Concrete 
Computational  Complexity 

Linda  Shockey 
Research  Associate 
B.S.,  Ohio  State  University  (1967) 

Ph  D.,  Ohio  State  University  (1973) 
Carnegie.  1972:  Linguistics  and 
Automatic  Speech  Recognition 

Daniel  P Siewiorek 

Assistant  Professor  of  Computer  Science 
ind  Electrical  Engineering 
B.S.,  University  of  Michigan  (1968) 

M s , Stanford  University  (1969) 

Ph  D.,  Stanford  University  (1972) 

Carnegie,  1972:  Computer  Architecture 
Automatic  Design  Exploration, 

Computer  Descriptive  Languages 
Modeling,  Fault  Tolerant 
Computer  Design 


Richard  King  Mellon  Professor  of 
Computer  Science  and  Psychology 
A.B.,  University  of  Chicago  (1936) 

Ph  D.,  Universi  y of  Chicago  (1943) 

D Sc,  (Hon  ),  Cjse  Institute  of 
Technology  (1963), 

D.Sc,  (Hon  ),  Yale  University  (1Q63) 

LL  U.  (Hon  ),  University  of  Chicago  (1964) 
LL.D.  (Hon.),  McGill  University  (1970) 

Fil_D.  (Hon  ),  University  of  Lund,  Sweden  (19 
U.Econ.Sci,,  Erasmus  University  of 
Rotterdam  (1973) 

Carnegie,  1949:  Computer  Simulation  of 
Cognitive  Processes,  Artificial 
Intelligence,  and  Management  Science 


Shigeharu  Sugita 
Visiting  Researcher 
Ph  D.,  Kyoto  University  (1968) 
Carnegie,  1975  Artificial  Intelligence 

Yoshiro  Tochio 
Visiting  Scholar 

B.S.,  Osaka  University,  Japan  (1969) 
Carnegie,  1974:  Artificial  Intelligence 
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Departmental  Staff 
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I. 


Joseph  F.  Traub 

Professor  of  Computer  Science  and 
Mathematics,  and  Head  of  the 
Department  of  Computer  Science 
B.3.,  City  College  of  New  York  (1954) 

Ph  D.,  Columbia  University  (1959) 
Carnegie,  1971  Numerical  Mathematics, 
Computational  Complexity,  Parallel 
Computation,  Algorithmic  Analysis 


Henryk  Wozniakowski 

Visiting  Assistant  Professor 
M.S.,  University  of  Warsaw  (1969) 

Ph.D.,  University  of  Warsaw  (1972) 
Carnegie,  1973:  Numerical  Mathematics, 
Computational  Complexity 
58 

William  A Wulf 

Associate  Professor  of  Computer  Science 
B.S.,  University  of  Illinois  (1961) 

M.S.E.E  , University  of  Illinois  (1963) 

D Sc.,  University  of  Virginia  (1968) 
Carnegie,  1968:  Programming  Systems: 
Compiler  Optimization,  Operating 
Systems,  Systems  Programming 
Languages,  and  Multiprocessor  Systems 


Engineering 

Mark  Adam— Technician 

William  Broadley— Manager  of  Engineering  Design 

Paolo  Coraluppi — Research  Engineer 

Mike  Keegan— Draftsman 

Tim  Kirby— Staff  Engineer 

Stan  Kriz— Engineer 

Rich  Lang— Technician 

Mike  Powell— Engineer 

Brian  Rosen— Staff  Engineer 

Ken  Stupak— Technician 

Jim  Teter— Manager  of  Engineering  Production 
Nancy  Whitaker— Technical  Clerk 

Office  Staff 

Nancy  Barron— Secretary  to  Department  head 
Mildred  Black— Secretary  to  Professor  Newell 
Judith  Brantley— Secretary  to  Business  Manager 
Beverly  Howell— Secretary  to  Dr.  Reddy 
Dorothy  Josephson— Faculty  Secretary 
Deborah  Lemmon— Department  Secretary 
Paul  Newbury  —Business  Manager 
Ruth  Ann  Sellhamer— Assistant  Business  Manager 
Susan  Sevigny— Documentation  Librarian 

Programming  and  Operations 
Patrick  Banwell— Research  Programmer 
Christopher  Cooper— Programmer 
Robert  Cronk— Research  Programmer 
Gregory  Gill— Visiting  Research  Associate 
John  Godfrey— Programmer 

Ralph  Guggenheim— Technician/Research  Assistant 
Hank  Mashburn— Senior  Systems  Analyst 
Eric  Ostrom— Programmer 
Chuck  Pierson— Research  Programmer 
Brian  Reid— Research  Programmer 
George  Robertson— Senior  Research  Programmer 
Jim  Skees— Operator 
Howard  Susman— Research  Programmer 
Harold  Van  Zoeren— Senior  Research  Programmer 
Dave  Vavra— Operator/Programmer 
Howard  Wactlar— Manager  of  Progtamming  and 
Operations 
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Graduate  Students 

Guy  T.  Aimes 

B.A.,  Rice  University  (1972) 

Mathematics  and  Electrical  Engineering 

M S.,  Rice  University  (1972) 

Electrical  Engineering 

Gideon  Ariely 

B A,,  Hebrew  University  (1969) 

Mathematics,  Philosophy,  Compi  ler  Science 

Gerard  M.  Baudet 

Diplome  d'lng6nieur,  Ecole  P iiytechnique  (1970) 
Mathematics 

Dipidme  d'Etudes  Approfondies,  Universite  Paris  IV 

(1971) 

Computer  Science 

Doctorat  de  3dme  cycle.  University  Paris  VI  (1973) 
Computer  Science 

Madeline  Bauer 

A. B  , Cornell  University  (1968) 

Mathematics 

M.A.,  University  of  Michigan  (1970) 

Computing  and  Communications  Sciences 

Andrew  P.  Buchalter 

B. S.,  Yale  University  (1974) 

Physics 

Roderic  G.  Cattell 

B.S.,  University  of  Illinois  (1974) 

Computer  Science 

Robert  J.  Chansler,  Jr. 

B.S.,  California  Institute  of  Technology  (1974) 
Mathematics-IS 

Douglas  W.  Clark 

B.S.,  Yale  University  (1972) 

Engineering  and  Applied  Science 

Donald  N.  Cohen 

B.S.,  Carnegie-Mellon  University  (1973) 
Mathematics 

Ellis  Cohen 

B.S.,  Drexel  Institute  of  Technology  (1970) 
Mathematics 

Lee  W.  Cooprider 

B.A.,  Oberlin  College  (1969) 

Mathematics 


William  M.  Corwin 

B.S.,  Carnegie-Mellon  University  (1972) 

Physics 

Achim  Eckert 

Diplom  Ingenieur,  Technische  Universitat  Berlin 
(1974) 

Electrical  Engineering 

David  C.  Ekiund 

B.A.,  Harvard  University  (1968) 

Applied  Mathematics 

Craig  F.  Everhart 

B.A,,  Wesleyan  University  (1974) 

Physics 

Peter  Feller 

Abitur,  Gymnasium  Bad  Toelz  (1971) 

Mathematics 

Vordiplom,  Technical  University  of  Munich  (1973) 
Computer  Science 

Richard  Fennell 

B.S.,  Rensselaer  Polytechnic  Institute  (1969) 
Physics 

Lawrence  E.  Flon 

B.S.,  SUNY  at  Stony  Brook  (1972) 

Physics 

Charles  L.  Forgy 

B.S.,  University  of  Texas  at  Arlington  (1972) 
Mathematics 

John  G.  Gaschnig 

B.S.E.E.,  Massachusetts  Institute  of 
Technology  (1972) 

Computer  Science 

Henry  Goldberg 

S B.,  Massachusetts  Institute  of  Technology 
(1968) 

Mathematics 

Richard  H Gumpertz 

S.B.E.E.,  Massachusetts  Institute  of 

Technology  (1973) 

Electrical  Engineering 

Samuel  P.  Harbison  III 

A.B.,  Princeton  University  (1974) 

Mathematics 


O 


Don  Heller 

B.S.,  Carnegie-Mellon  University  (1971) 
Mathematics 

Paul  N.  Hilfinger 

A.B.,  Princeton  University  (1973) 

Mathematics 

Steven  O.  Hobbs 

A. B.,  Dartmouth  College  (1969) 

Mathematics 

B. A.,  University  of  Michigan  (1972) 
Mathematics  (Computer  Science  Option) 

David  R.  Jefferson 
B.S.,  Yale  University  (1970) 

Mathematics 

Richard  Johnsson 

B.E.,  Vanderbilt  University  (1970) 

Electrical  Engineering 

Philip  Karlton 

B.A.,  University  of  California,  Santa  Barbara 
(1971) 

Mathematics 
John  R.  Kender 

R.S.,  University  of  Detroit  (1970) 
Mathematics 

M.S..  University  of  Michigan  (1972) 
Mathematics 

Paul  J.  Knueven 

Sc.B.,  Brown  University  (1969) 

Applied  Mathematics 

Donald  W.  Kosy 

B.S.,  University  of  Michigan  (1967) 

Science  Engineering 
M S.,  Stanford  University  (1968) 

Electrical  Engineering 
M.S.,  Stanford  University  (1969) 

Computer  Science 

David  A.  Lamb 

B.S.,  University  of  Waterloo  (1974) 
Computer  Science 

Bruce  W.  Leverett 

A.B.,  Harvard  University  (1973) 

Physics  and  Chemistry 


Bruce  Lowerre 

B.S.,  Case  Institute  of  Technology  (1965) 
Chemistry 

B.S.,  Case  Western  Reserve  (1970) 

Mathematics 

Madhav  Marathe 

B.S.,  University  of  Bombay  (1971) 

Physics 

M.S.,  Indian  Institute  of  Technology,  Kanpur 

(1972) 

Physics 

Karla  F.  Martin 

B A.,  Western  Washington  State  College  (1967) 
Mathematics,  Physics 

M.A.,  University  of  Oregon  (1969) 

Mathematics 

M.A.,  University  of  Oregon  (1972) 

Computer  Science 

Philip  H.  Mason 

B.S.,  Carnegie-Mellon  University  (1967) 
Mathematics 

Donald  L.  McCracken 

B.S.,  Carnegie-Mellon  University  (1968) 
Mathematics 

Patrick  F.  McGehearty 

B.A.,  University  of  Texas  at  Austin  (1972) 
Mathematics  and  Computer  Science 

M.A.,  University  of  Texas  at  Austin  (1974) 
Computer  Science 

Rajan  S.  Modi 

D.Tech.,  Indian  Institute  of  Technology  (1974) 
Electrical  Engineering 

David  J.  Mostow 

A. B.,  Harvard  University  (1974) 

Applied  Matf-ematics 

Joseph  Newcomer 

B. A.,  St.  Vincent  College  (1967) 

Mathematics 

John  D.  Oakley 

S.,  Harvey  Mudd  College  (1970) 

Physics 

M.S.,  University  of  Wisconsin  (1972) 

Computer  Science 


Roy  Levin 

B.S.,  Yale  University  (1970) 
Mathematics 


Ronald  Ohiander 

B.S.,  St.  Mary’s  College  (1962) 

Psychology 

Crispin  S.  Perdue 

A. B.,  Princeton  University  (1973) 

Independent  Program 

Frederick  Pollack 

B. S.,  University  of  Florida  (1970) 

Mathematics 

Keith  Price 

B.S.,  Massachusetts  Institute  of  Technology 
(1971) 

Electrical  Engineering 
Kamesh  Ramakrishna 

B.S.,  Indian  Institute  of  Technology,  Kanpur  (1974) 
Electrical  Engineering 

Elaine  Rich 

A. B.,  Brown  University  (1972) 

Formal  Language  Theory 

George  Rolf 

B. S.,  University  of  Nymejen  (1966) 

Mathematics 

M.S.,  University  of  Nymejen  (1970) 

Numerical  Analysis 

Steven  M.  Rubin 

B.S.,  Carnegie-Mellon  University  (1974) 
Mathematics 

Michael  Rychener 

A. B.,  Oberlin  College  (1969) 

Mathematics 

M.S.,  Stanford  University  (1971) 

Computer  Science 

Steven  Saunders 

S.B.,  Massachusetts  Institute  of  Technology 
(1972) 

Computer  Science 
Edward  Schneider 

B. S.,  Can  'igie-Mellon  University  (1970) 

Mathe  atics 


Richard  Smith 

B.S.,  Houghton  College  (1971) 

Physics  and  Mathematics 

David  K.  Stevenson 

B.A.,  Wesleyan  University  (1969) 

English  and  Mathematics 
M.A.,  University  of  Oregon  (1972) 

Mathematics 

Mark  Stickel 

B.S.,  University  of  Washington  (1969) 
Mathematics 

M.S.,  University  of  Washington  (1971) 

Computer  Science 

Richard  J.  Swan 

B.A.,  University  of  Essex  (1972) 

Computing  Science 

Walter  F.  Tichy 

Reifezeugnis,  Karlsgymnasium  Bad  Reichenhall 
(1971) 

Diplom-Vorprufung,  Technical  University  Munich 
(1973) 

Mathematics  and  Computer  Science 
Bruce  W.  Weide 

B.S.,  University  of  Toledo  (1974) 

Electrical  Engineering 

Charles  Weinstock 

B.S.,  Carnegie-Mellon  University  (1970) 
Mathematics 


Robert  W.  Schwanke 
B.S.,  Carnegie-Mellon  University  (1974) 
Mathematics  and  Computer  Science 


Publicall^nt 

Julyl,  1974  to  June  30,  1975 

These  publications  are  given  in  alphabetical  order 
according  to  the  name  of  the  first  author  listed  for 
each  publication.  In  cases  of  multiple  authorship 
where  more  than  one  author  is  in  the  Computer 
Science  Department,  a cross-reference  is  made  to 
that  first  listing  under  the  name  of  each  departmental 
author. 

No  cross-references  are  made  for  non-depart- 
mental  authors. 

Barbacci,  M.  R.  and  D.  P.  Siewiorek,  "Some  Aspects 
of  the  Symbolic  Manipulation  of  Computer  De- 
scriptions", Second  Annual  Workshop  on  Com- 
puter Hardware  Description  Languages,  Tech- 
nische  Hochschule,  Darmstadt,  West  Germany, 
July  1974. 

Barbacci,  M.  R.  and  D.  P.  Siewiorek,  "Some  Obser- 
vations on  Modular  Design  Technology  and  the 
Use  of  Microprogramming",  Infolech  Stale  of  the 
Art  Report  on  Microprogramming  and  Systems 
Architecture.  Berkshire,  UK,  (to  appear  in  1975). 

Barbacci,  M.  R.,  "A  Comparison  of  Register  T ransfer 
Languages  for  De’cribing  Computers  and  Digital 
Systerrs"’,  IEEE  Transactions  on  Computers,  Vol. 
c-24.  No.  2,  February  1975,  137-150,  PB  221591 

Berliner,  H.  J.,  "A  Representation  of  Some  Mecha- 
nisms for  a Problem  Solving  Chess  Program  ",  to 
appear  in  Recent  Advances  in  Computer  Chess, 
Edinburgh  University  Press. 

For  references  by  W.  Broadley,  see  D.  R.  Reddy. 

Cohen,  E.  S.,  ’"A  Semantic  Model  for  Parallel  Sys- 
tems with  Scheduling",  Second  ACM  Symposium 
on  Principles  of  Programming  Languages,  Palo 
Alto,  Ca.,  January  1975. 

Cooprider,  L.  W.,  F.  Reymans,  R.  J.  Courtois  and 
D.  L.  Parnas,  "Information  Streams  Sharing  a 
Finite  Buffer:  Othei  Solutions",  Information  Pro- 
cessing Letters,  3:1  July  1974,  16-21. 

Eastman,  C.  M.,  J.  LividinI  and  D.  Stoker,  "A  Data- 
base for  Designing  Large  Physical  Systems", 
1975  National  Computer  Conference  Proceed- 
ings, AnahCtm,  Ca.,  1975. 


Eastman.  C.  M and  J.  Lividini,  "’System  Design  for  a 
Building  Description  System”,  CIS  W52  Sym- 
posium on  Computer  Languages  in  Building, 
Budapest,  Hungary,  April  1975. 

Eastman,  C.  M.,  J.  Lividini  and  D Stoker,  "A  Data 
Structure  for  Building  Elements",  CIS  W52  Sym- 
posium on  Computer  Languages  in  Building, 
Budapest,  Hungary,  April  1975. 

Eastman,  C.  M.  and  J.  Lividini,  "Spatial  Search" 
(revised).  Institute  of  Physical  Planning,  Research 
Report  No.  55,  CMU,  April  1975. 

Fuller,  S.  H.,  V.  R.  Lesser,  C.  G.  Bell  and  C.  Kaman, 

" Microprogramming  and  Its  Relation  to  Emulation 
and  Technology",  Seventh  Annual  Microprogram- 
ming Conference,  Palo  Alto,  Ca.,  October  1974. 

For  other  references  by  S.  H.  Fuller,  see  M.  V. 
Marathe. 

Gaschnig,  J.  G.,  "A  Constraint  Satisfaction  Method 
for  Inference  Making",  Proc.  Twelfth  Annual 
Allerton  Conference  on  Circuit  and  System 
Theory,  University  of  Illinois  at  Urbana-Cham- 
paign,  October  1974. 

Gilmartin,  K.  J.,  A.  Newell  and  H.  A.  Simon,  "A  Pro- 
gram Modeling  Short-Term  Memory  Under  Strategy 
Control",  Psychology  Dept.,  CIP  Working  Paper 
No.  293,  CMU,  March  1975. 

Godfrey,  J.  D.,  J.  M.  Powell,  and  E.  A.  Snow,  "A 
Cardiac  Arrhythmia  Monitoring  System",  I.E.E.E. 
Computer  .Society  Conference,  Washington,  D C 
September  1974. 

Grason,  J.  and  D.  P.  Siewiorek,  "A  Modular  Ap- 
proach to  Prototype  System  Construction  in  Real- 
Time  Minicomputer  Laboratory",  COMP  CON  74, 
Ninth  Annual  IEEE  Computer  Society  fnternational 
Conference,  Washington,  D.C.,  September  1974, 
139-143. 

Hayes-Roth,  B.  and  F.  Hayes-Roth,  "Plasticity  in 
Memorial  Networks",  Journal  of  Verbal  Learning 
and  Verbal  Behavior,  (to  appear) 

Hayes-Roth,  F.,  "Schematic  Classification  Problems 
and  Their  Solution”,  Pattern  Recognition,  1974, 
6,  105-114. 
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Hayes-Roth,  F„  "Representation  of  Structured  Events 
and  Efficient  Proc-  'ures  for  Their  Recognition”. 
Pattern  Recognition,  (to  appear). 


Hayes-Roth,  F.,  "An  Optimal  Network  Representa- 
tion and  Other  Mechanisms  for  the  Recognition 
of  Structured  Events”,  Proc.  Second  International 
Joint  Conference  on  Pattern  Recognition,  1974. 


Hayes-Roth,  r.  and  D.  J Mostow,  "An  Automatically 
Compilable  Recognition  Ne'work  for  Structured 
Patterns”,  Proc.  Fourth  International  Joint  Con- 
ference on  Artificial  Intelligence,  (to  appear). 


Heller,  D , "A  Determinant  Theorem  with  Applica- 
tions to  Parallel  Algorithms",  SIAM  J.  Num.  Anal., 
Vol.  II,  No.  3,  June  1974,  559-568. 


Heller,  D.,  "On  the  Efficient  Computation  of  Recur- 
rence Relations",  The  Institute  for  Computer  Ap- 
plications in  Science  and  Engineering,  NASA 
Langley  Research  Center,  Hampton,  Va.,  June 
1974. 


Huen,  W.  H.  and  D.  P.  Siewiorek,  "Intermodule 
Protocol  for  Register  Transfer  Level  Modules: 
Taxonomy  and  Analytic  Tools",  Proc.  Second 
Annual  Symposium  on  Computer  Architecture, 
Houston,  Tx.,  February  1975. 
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